The document discusses memory hierarchy and caching techniques. It begins by explaining the need for a memory hierarchy due to differing access times of memory technologies like SRAM, DRAM, and disk. It then covers concepts like cache hits, misses, block size, direct mapping, set associativity, compulsory misses, capacity misses, and conflict misses. Finally, it discusses using a second-level cache to reduce memory access times by capturing misses from the first-level cache.
About Cache Memory
working of cache memory
levels of cache memory
mapping techniques for cache memory
1. direct mapping techniques
2. Fully associative mapping techniques
3. set associative mapping techniques
Cache memroy organization
cache coherency
every thing in detail
About Cache Memory
working of cache memory
levels of cache memory
mapping techniques for cache memory
1. direct mapping techniques
2. Fully associative mapping techniques
3. set associative mapping techniques
Cache memroy organization
cache coherency
every thing in detail
Security Hash Algorithm (SHA) was developed in 1993 by the National Institute of Standards and Technology (NIST) and National Security Agency (NSA).
It was designed as the algorithm to be used for secure hashing in the US Digital Signature Standard.
• Hashing function is one of the most commonly used encryption methods. A hash is a special mathematical function that performs one-way encryption.
• SHA-l is a revised version of SHA designed by NIST and was published as a Federal Information Processing Standard (FIPS).
• Like MD5, SHA-l processes input data in 512-bit blocks.
• SHA-l generates a 160-bit message digest. Whereas MD5 generated message digest of 128 bits.
• The procedure is used to send a non secret but signed message from sender to receiver. In such a case following steps are followed:
1. Sender feeds a plaintext message into SHA-l algorithm and obtains a 160-bit SHA-l hash.
2. Sender then signs the hash with his RSA private key and sends both the plaintext message and the signed hash to the receiver.
3. After receiving the message, the receiver computes the SHA-l hash himself and also applies the sender's public key to the signed hash to obtain the original hash H.
A compact bytecode format for JavaScriptCoreTadeu Zagallo
JavaScriptCore (JSC) is the multi-tiered JavaScript virtual machine in WebKit. The bytecode is a central piece in JSC: it’s executed by the interpreter and the source of truth for all of JSC’s compilers. In this talk we’ll look at the recent redesign of our bytecode format, which cut its size in half and enabled persisting the bytecode on disk without impacting the overall performance of the system.
Abstract
There is great research going on in the field of data security nowadays. Protecting information from disclosure and breach is of high importance to users personally and to organizations and businesses around the world, as most of information currently are sensitive electronic information transferred over the internet and stored in cloud based system. In this paper, we propose a method to increase the security of messages transferred on the internet, or information stored in the cloud. Our proposed method mainly relies on the Triple Data Encryption Standard (TDES) algorithm. TDES is intact the Data Encryption Standard repeated three times in succession to encrypt data. TDES is considered highly secure as there is no applicable method to break the code itself without knowing the key. We propose to encrypt the key using Cipher Feedback Block algorithm, before using TDES to encrypt data. Such that even when the key is disclosed, the key itself cannot decipher the ciphered text without enciphering the key with CFB. This introduces a new dimension of security to the TDES algorithm.
The method introduced in this paper increases the security of the TDES algorithm using CFB algorithm by increasing the key security, such that it is actually not possible to decipher the text without prior knowledge and agreement of key and algorithms used.
Keywords: Data Encryption Standard, Triple Data Encryption Algorithm, Cipher Feedback Block.
Security Hash Algorithm (SHA) was developed in 1993 by the National Institute of Standards and Technology (NIST) and National Security Agency (NSA).
It was designed as the algorithm to be used for secure hashing in the US Digital Signature Standard.
• Hashing function is one of the most commonly used encryption methods. A hash is a special mathematical function that performs one-way encryption.
• SHA-l is a revised version of SHA designed by NIST and was published as a Federal Information Processing Standard (FIPS).
• Like MD5, SHA-l processes input data in 512-bit blocks.
• SHA-l generates a 160-bit message digest. Whereas MD5 generated message digest of 128 bits.
• The procedure is used to send a non secret but signed message from sender to receiver. In such a case following steps are followed:
1. Sender feeds a plaintext message into SHA-l algorithm and obtains a 160-bit SHA-l hash.
2. Sender then signs the hash with his RSA private key and sends both the plaintext message and the signed hash to the receiver.
3. After receiving the message, the receiver computes the SHA-l hash himself and also applies the sender's public key to the signed hash to obtain the original hash H.
A compact bytecode format for JavaScriptCoreTadeu Zagallo
JavaScriptCore (JSC) is the multi-tiered JavaScript virtual machine in WebKit. The bytecode is a central piece in JSC: it’s executed by the interpreter and the source of truth for all of JSC’s compilers. In this talk we’ll look at the recent redesign of our bytecode format, which cut its size in half and enabled persisting the bytecode on disk without impacting the overall performance of the system.
Abstract
There is great research going on in the field of data security nowadays. Protecting information from disclosure and breach is of high importance to users personally and to organizations and businesses around the world, as most of information currently are sensitive electronic information transferred over the internet and stored in cloud based system. In this paper, we propose a method to increase the security of messages transferred on the internet, or information stored in the cloud. Our proposed method mainly relies on the Triple Data Encryption Standard (TDES) algorithm. TDES is intact the Data Encryption Standard repeated three times in succession to encrypt data. TDES is considered highly secure as there is no applicable method to break the code itself without knowing the key. We propose to encrypt the key using Cipher Feedback Block algorithm, before using TDES to encrypt data. Such that even when the key is disclosed, the key itself cannot decipher the ciphered text without enciphering the key with CFB. This introduces a new dimension of security to the TDES algorithm.
The method introduced in this paper increases the security of the TDES algorithm using CFB algorithm by increasing the key security, such that it is actually not possible to decipher the text without prior knowledge and agreement of key and algorithms used.
Keywords: Data Encryption Standard, Triple Data Encryption Algorithm, Cipher Feedback Block.
Chapter 8 1 Digital Design and Computer Architecture, 2n.docxchristinemaritza
Chapter 8 <1>
Digital Design and Computer Architecture, 2nd Edition
Chapter 8
David Money Harris and Sarah L. Harris
Chapter 8 <2>
Chapter 8 :: Topics
• Introduction
• Memory System Performance
Analysis
• Caches
• Virtual Memory
• Memory-Mapped I/O
• Summary
Chapter 8 <3>
Processor Memory
Address
MemWrite
WriteData
ReadData
WE
CLKCLK
• Computer performance depends on:
– Processor performance
– Memory system performance
Memory Interface
Introduction
Chapter 8 <4>
In prior chapters, assumed access memory in 1 clock
cycle – but hasn’t been true since the 1980’s
Processor-Memory Gap
Chapter 8 <5>
• Make memory system appear as fast as
processor
• Use hierarchy of memories
• Ideal memory:
– Fast
– Cheap (inexpensive)
– Large (capacity)
But can only choose two!
Memory System Challenge
Chapter 8 <6>
Memory Hierarchy
Technology Price / GB
Access
Time (ns)
Bandwidth
(GB/s)
Cache
Main Memory
Virtual Memory
Capacity
S
p
e
e
d
SRAM $10,000 1
DRAM $10 10 - 50
SSD $1 100,000
25+
10
0.5
0.1HDD $0.1 10,000,000
Chapter 8 <7>
Exploit locality to make memory accesses fast
• Temporal Locality:
– Locality in time
– If data used recently, likely to use it again soon
– How to exploit: keep recently accessed data in higher
levels of memory hierarchy
• Spatial Locality:
– Locality in space
– If data used recently, likely to use nearby data soon
– How to exploit: when access data, bring nearby data
into higher levels of memory hierarchy too
Locality
Chapter 8 <8>
• Hit: data found in that level of memory hierarchy
• Miss: data not found (must go to next level)
Hit Rate = # hits / # memory accesses
= 1 – Miss Rate
Miss Rate = # misses / # memory accesses
= 1 – Hit Rate
• Average memory access time (AMAT): average time
for processor to access data
AMAT = tcache + MRcache[tMM + MRMM(tVM)]
Memory Performance
Chapter 8 <9>
• A program has 2,000 loads and stores
• 1,250 of these data values in cache
• Rest supplied by other levels of memory
hierarchy
• What are the hit and miss rates for the cache?
Memory Performance Example 1
Chapter 8 <10>
• A program has 2,000 loads and stores
• 1,250 of these data values in cache
• Rest supplied by other levels of memory
hierarchy
• What are the hit and miss rates for the cache?
Hit Rate = 1250/2000 = 0.625
Miss Rate = 750/2000 = 0.375 = 1 – Hit Rate
Memory Performance Example 1
Chapter 8 <11>
• Suppose processor has 2 levels of hierarchy:
cache and main memory
• tcache = 1 cycle, tMM = 100 cycles
• What is the AMAT of the program from
Example 1?
Memory Performance Example 2
Chapter 8 <12>
• Suppose processor has 2 levels of hierarchy:
cache and main memory
• tcache = 1 cycle, tMM = 100 cycles
• What is the AMAT of the program from
Example 1?
AMAT = tcache + MRcache(tMM)
= [1 + 0.375(100)] cycles
= 38.5 cycles
Memory Performance Example 2
Chapter 8 <13>
• Amdahl’s Law: the
effort spent incr ...
Explain cache memory with a diagram, demonstrate hit ratio and miss penalty with an example. Discussed different types of cache mapping: direct mapping, fully-associative mapping and set-associative mapping. Discussed temporal and spatial locality of references in cache memory. Explained cache write policies: write through and write back. Shown the differences between unified cache and split cache.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Neuro-symbolic is not enough, we need neuro-*semantic*
Cache recap
1. Computation I pg 1
Memory Hierarchy, why?
• Users want large and fast memories!
SRAM access times are 1 – 10 ns
DRAM access times are 20-120 ns
Disk access times are 5 to 10 million ns, but it’s bits are very cheap
• Get best of both worlds: fast and large memories:
– build a memory hierarchy
CPU
Level 1
Level 2
Level n
Size
Speed
2. Computation I pg 2
Memory recap
• We can build a memory – a logical k × m array of
stored bits. Usually m = 8 bits / location
•
•
•
n bits address
k = 2n locations
m bits data / entry
Address Space:
number of locations
(usually a power of 2)
Addressability:
m: number of bits per location
(e.g., byte-addressable)
3. Computation I pg 3
• SRAM:
– value is stored with a pair of inverting gates
– very fast but takes up more space than DRAM (4 to 6
transistors)
• DRAM:
– value is stored as a charge on capacitor (must be
refreshed)
– very small but slower than SRAM (factor of 5 to 10)
– charge leakes =>
• refresh needed
Memory element: SRAM vs DRAM
Word line
Pass transistor
Capacitor
Bit line
5. Computation I pg 5
Exploiting Locality
• Locality = principle that makes having a memory hierarchy a good idea
• If an item is referenced,
temporal locality: it will tend to be referenced again soon
spatial locality : nearby items will tend to be referenced soon.
Why does code have locality?
• Our initial focus: two levels (upper, lower)
– block: minimum unit of data
– hit: data requested is in the upper level
– miss: data requested is not in the upper level
block
$
lower level
upper level
6. Computation I pg 6
Cache operation
Memory/Lowerlevel
Cache / Higher level
block / line
tags data
7. Computation I pg 7
• Mapping: cache address is memory address modulo the
number of blocks in the cache
Direct Mapped Cache
00001 00101 01001 01101 10001 10101 11001 11101
000
Cache
Memory
001
010
011
100
101
110
111
8. Computation I pg 8
Q:What kind
of locality
are we taking
advantage of
in this
example?
Direct Mapped Cache
20 10
Byte
offset
Valid Tag DataIndex
0
1
2
1021
1022
1023
Tag
Index
Hit Data
20 32
31 30 13 12 11 2 1 0
Address (bit positions)
9. Computation I pg 9
• This example exploits (also) spatial locality (having
larger blocks):
Direct Mapped Cache
Address (showing bit positions)
16 12 Byte
offset
V Tag Data
Hit Data
16 32
4K
entries
16 bits 128 bits
Mux
32 32 32
2
32
Block offsetIndex
Tag
31 16 15 4 32 1 0
Address (bit positions)
10. Computation I pg 10
• Read hits
– this is what we want!
• Read misses
– stall the CPU, fetch block from memory, deliver to cache, restart the
load instruction
• Write hits:
– can replace data in cache and memory (write-through)
– write the data only into the cache (write-back the cache later)
• Write misses:
– read the entire block into the cache, then write the word (allocate on
write miss)
– do not read the cache line; just write to memory (no allocate on write
miss)
Hits vs. Misses
11. Computation I pg 11
Splitting first level cache
• Use split Instruction and Data caches
– Caches can be tuned differently
– Avoids dual ported cache
Program
Block size in
words
Instruction
miss rate
Data miss
rate
Effective combined
miss rate
gcc 1 6.1% 2.1% 5.4%
4 2.0% 1.7% 1.9%
spice 1 1.2% 1.3% 1.2%
4 0.3% 0.6% 0.4%
CPU
I$
D$
I&D
$
Main Memory
L1 L2
13. Computation I pg 13
Performance example (1)
• Assume application with:
– Icache missrate 2%
– Dcache missrate 4%
– Fraction of ld-st instructions = 36%
– CPI ideal (i.e. without cache misses) is 2.0
– Misspenalty 40 cycles
• Calculate CPI taking misses into account
CPI = 2.0 + CPIstall
CPIstall = Instruction-miss cycles + Data-miss cycles
Instruction-miss cycles = Ninstr x 0.02 x 40 = 0.80 Ninstr
Data-miss cycles = Ninstr x %ld-st x 0.04 x 40
CPI = 3.36
Slowdown: 1.68 !!
14. Computation I pg 14
Performance example (2)
1. What if ideal processor had CPI = 1.0 (instead of 2.0)
• Slowdown would be 2.36 !
2. What if processor is clocked twice as fast
• => penalty becomes 80 cycles
• CPI = 4.75
• Speedup = N.CPIa.Tclock / (N.CPIb.Tclock/2) =
3.36 / (4.75/2)
• Speedup is not 2, but only 1.41 !!
15. Computation I pg 15
Improving cache / memory performance
• Ways of improving performance:
– decreasing the miss ratio (avoiding conflicts): associativity
– decreasing the miss penalty: multilevel caches
– Adapting block size: see earlier slides
– Note: there are many more ways to improve memory
performance
(see e.g. master course 5MD00)
16. Computation I pg 16
How to reduce CPIstall ?
CPIstall = %reads • missrateread • misspenaltyread+
%writes • missratewrite • misspenaltywrite
Reduce missrate:
• Larger cache
– Avoids capacity misses
– However: a large cache may increase Tcycle
• Larger block (line) size
– Exploits spatial locality: see previous lecture
• Associative cache
– Avoids conflict misses
Reduce misspenalty:
• Add 2nd level of cache
17. Computation I pg 17
Decreasing miss ratio with
associativity
Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data
Eight-way set associative (fully associative)
Tag Data Tag Data Tag Data Tag Data
Four-way set associative
Set
0
1
Tag Data
One-way set associative
(direct mapped)
Block
0
7
1
2
3
4
5
6
Tag Data
Two-way set associative
Set
0
1
2
3
Tag Data
block
2 blocks / set
4 blocks / set
8 blocks / set
18. Computation I pg 18
An implementation: 4 way associative
Address
22 8
V TagIndex
0
1
2
253
254
255
Data V Tag Data V Tag Data V Tag Data
3222
4-to-1 multiplexor
Hit Data
123891011123031 0
20. Computation I pg 20
Further Cache Basics
•cache_size = Nsets x Associativity x Block_size
•block_address = Byte_address DIV Block_size in
bytes
•index size = Block_address MOD Nsets
• Because the block size and the number of sets are
(usually) powers of two, DIV and MOD can be performed
efficiently
tag index block
offset
block address
… 2 1 0bit 31 …
21. Computation I pg 21
Comparing different (1-level) caches (1)
• Assume
– Cache of 4K blocks
– 4 word block size
– 32 bit address
• Direct mapped (associativity=1) :
– 16 bytes per block = 2^4
– 32 bit address : 32-4=28 bits for index and tag
– #sets=#blocks/ associativity : log2 of 4K=12 : 12 for index
– Total number of tag bits : (28-12)*4K=64 Kbits
• 2-way associative
– #sets=#blocks/associativity : 2K sets
– 1 bit less for indexing, 1 bit more for tag
– Tag bits : (28-11) * 2 * 2K=68 Kbits
• 4-way associative
– #sets=#blocks/associativity : 1K sets
– 1 bit less for indexing, 1 bit more for tag
– Tag bits : (28-10) * 4 * 1K=72 Kbits
22. Computation I pg 22
Comparing different (1-level) caches (2)
3 caches consisting of 4 one-word blocks:
• Cache 1 : fully associative
• Cache 2 : two-way set associative
• Cache 3 : direct mapped
Suppose following sequence of block
addresses: 0, 8, 0, 6, 8
23. Computation I pg 23
Direct Mapped
Block address Cache Block
0 0 mod 4=0
6 6 mod 4=2
8 8 mod 4=0
Address of
memory block
Hit or
miss
Location
0
Location
1
Location
2
Location
3
0 miss Mem[0]
8 miss Mem[8]
0 miss Mem[0]
6 miss Mem[0] Mem[6]
8 miss Mem[8] Mem[6]
Coloured = new entry = miss
24. Computation I pg 24
2-way Set Associative:
2 sets
Block address Cache Block
0 0 mod 2=0
6 6 mod 2=0
8 8 mod 2=0
Address of
memory block
Hit or
miss
SET 0
entry 0
SET 0
entry 1
SET 1
entry 0
SET 1
entry 1
0 Miss Mem[0]
8 Miss Mem[0] Mem[8]
0 Hit Mem[0] Mem[8]
6 Miss Mem[0] Mem[6]
8 Miss Mem[8] Mem[6]
LEAST RECENTLY USED BLOCK
(so all in set/location 0)
25. Computation I pg 25
Fully associative
(4 way assoc., 1 set)
Address of
memory block
Hit or
miss
Block 0 Block 1 Block 2 Block 3
0 Miss Mem[0]
8 Miss Mem[0] Mem[8]
0 Hit Mem[0] Mem[8]
6 Miss Mem[0] Mem[8] Mem[6]
8 Hit Mem[0] Mem[8] Mem[6]
26. Computation I pg 26
Review: Four Questions for Memory
Hierarchy Designers
•Q1: Where can a block be placed in the upper
level? (Block placement)
– Fully Associative, Set Associative, Direct Mapped
•Q2: How is a block found if it is in the upper
level?
(Block identification)
– Tag/Block
•Q3: Which block should be replaced on a miss?
(Block replacement)
– Random, FIFO, LRU
•Q4: What happens on a write?
(Write strategy)
– Write Back or Write Through (with Write Buffer)
27. Computation I pg 27
Classifying Misses: the 3 Cs
•The 3 Cs:
– Compulsory—First access to a block is always a
miss. Also called cold start misses
• misses in infinite cache
– Capacity—Misses resulting from the finite
capacity of the cache
• misses in fully associative cache with optimal replacement strategy
– Conflict—Misses occurring because several blocks
map to the same set. Also called collision misses
• remaining misses
28. Computation I pg 28
3 Cs: Compulsory, Capacity, Conflict
In all cases, assume total cache size not changed
What happens if we:
1) Change Block Size:
Which of 3Cs is obviously affected? compulsory
2) Change Cache Size:
Which of 3Cs is obviously affected? capacity
misses
3) Introduce higher associativity :
Which of 3Cs is obviously affected? conflict
misses
29. Computation I pg 29
Cache Size (KB)
MissRateperType
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
1
2
4
8
16
32
64
128
1-way
2-way
4-way
8-way
Capacity
Compulsory
3Cs Absolute Miss Rate (SPEC92)
Conflict
Miss rate per type
30. Computation I pg 30
Second Level Cache (L2)
• Most CPUs
– have an L1 cache small enough to match the cycle time
(reduce the time to hit the cache)
– have an L2 cache large enough and with sufficient
associativity to capture most memory accesses (reduce
miss rate)
• L2 Equations, Average Memory Access Time (AMAT):
AMAT = Hit TimeL1 + Miss RateL1 x Miss PenaltyL1
Miss PenaltyL1 = Hit TimeL2 + Miss RateL2 x Miss PenaltyL2
AMAT = Hit TimeL1 + Miss RateL1 x (Hit TimeL2 + Miss RateL2 x Miss
PenaltyL2)
• Definitions:
– Local miss rate— misses in this cache divided by the total number
of memory accesses to this cache (Miss rateL2)
– Global miss rate—misses in this cache divided by the total number
of memory accesses generated by the CPU
(Miss RateL1 x Miss RateL2)
31. Computation I pg 31
Second Level Cache (L2)
• Suppose processor with base CPI of 1.0
• Clock rate of 500 Mhz
• Main memory access time : 200 ns
• Miss rate per instruction primary cache : 5%
What improvement with second cache having 20ns access time,
reducing miss rate to memory to 2% ?
• Miss penalty : 200 ns/ 2ns per cycle=100 clock cycles
• Effective CPI=base CPI+ memory stall per instruction = ?
– 1 level cache : total CPI=1+5%*100=6
– 2 level cache : a miss in first level cache is satisfied by second cache or
memory
• Access second level cache : 20 ns / 2ns per cycle=10 clock cycles
• If miss in second cache, then access memory : in 2% of the cases
• Total CPI=1+primary stalls per instruction +secondary stalls per instruction
• Total CPI=1+5%*10+2%*100=3.5
Machine with L2 cache : 6/3.5=1.7 times faster
32. Computation I pg 32
Second Level Cache
• Global cache miss is similar to single cache miss rate of second
level cache provided L2 cache is much bigger than L1.
• Local cache rate is NOT good measure of secondary caches as it is function
of L1 cache.
Global cache miss rate should be used.
34. Computation I pg 34
• Make reading multiple words easier by using banks of memory
• It can get a lot more complicated...
How to connect the cache to next level?
CPU
Cache
Bus
Memory
a. One-word-wide
memory organization
CPU
Bus
b. Wide memory organization
Memory
Multiplexor
Cache
CPU
Cache
Bus
Memory
bank 1
Memory
bank 2
Memory
bank 3
Memory
bank 0
c. Interleaved memory organization