The document discusses cache memory organization and how to write cache-friendly code. It describes the typical levels of cache memory (L1, L2, L3) and how they are organized. There are three main types of cache organization: direct-mapped, set-associative, and fully associative. The document provides examples of accessing each type of cache and discusses issues with writes. It emphasizes the importance of exploiting spatial and temporal locality when writing code to minimize cache misses.
Explain cache memory with a diagram, demonstrate hit ratio and miss penalty with an example. Discussed different types of cache mapping: direct mapping, fully-associative mapping and set-associative mapping. Discussed temporal and spatial locality of references in cache memory. Explained cache write policies: write through and write back. Shown the differences between unified cache and split cache.
Explain cache memory with a diagram, demonstrate hit ratio and miss penalty with an example. Discussed different types of cache mapping: direct mapping, fully-associative mapping and set-associative mapping. Discussed temporal and spatial locality of references in cache memory. Explained cache write policies: write through and write back. Shown the differences between unified cache and split cache.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
4. 4
Cache Memory
• History
– At very beginning, 3 levels
• Registers, main memory, disk storage
– 10 years later, 4 levels
• Register, SRAM cache, main DRAM memory, disk storage
– Modern processor, 4~5 levels
• Registers, SRAM L1, L2(,L3) cache, main DRAM memory,
disk storage
– Cache memories
• are small, fast SRAM-based memories
• are managed by hardware automatically
• can be on-chip, on-die, off-chip
5. 5
Cache Memory
Figure 6.24 P488
main
memory
I/O
bridge
bus interfaceL2 cache
ALU
register file
CPU chip
cache bus system bus memory bus
L1
cache
6. 6
Cache Memory
• L1 cache is on-chip
• L2 cache is off-chip several years ago
• L3 cache can be off-chip or on-chip
• CPU looks first for data in L1, then in L2, then
in main memory
– Hold frequently accessed blocks of main memory
are in caches
7. 7
Inserting an L1 cache between the CPU and
main memory
a b c dblock 10
p q r sblock 21
...
...
w x y zblock 30
...
The big slow main memory
has room for many 4-word
blocks.
The small fast L1 cache has room
for two 4-word blocks.
The tiny, very fast CPU register file
has room for four 4-byte words.
The transfer unit between
the cache and main
memory is a 4-word block
(16 bytes).
The transfer unit between
the CPU register file and
the cache is a 4-byte block.
line 0
line 1
8. 8
6.4.1 Generic Cache Memory Organization
Figure 6.25 P488
• • • B–110
• • • B–110
valid
valid
tag
tag
set 0:
B = 2b
bytes
per cache block
E lines
per set
S = 2s
sets
t tag bits
per line
1 valid bit
per line
• • •
• • • B–110
• • • B–110
valid
valid
tag
tag
set 1: • • •
• • • B–110
• • • B–110
valid
valid
tag
tag
set S-1: • • •
• • •
Cache is an array
of sets.
Each set contains
one or more lines.
Each line holds a
block of data.
pp.488
9. 9
Addressing caches
Figure 6.25 P488
t bits s bits b bits
0m-1
<tag> <set index><block offset>
Address A:
• • •B–110
• • •B–110
v
v
tag
tag
set 0: • • •
• • •B–110
• • •B–110
v
v
tag
tag
set 1: • • •
• • •B–110
• • •B–110
v
v
tag
tag
set S-1: • • •
• • •
The word at address A is in the cache if
the tag bits in one of the <valid> lines in
set <set index> match <tag>.
The word contents begin at offset
<block offset> bytes from the beginning
of the block.
11. 11
Cache Memory
Derived quantities
Parameters Descriptions
M=2m
s=log2(S)
b=log2(B)
t=m-(s+b)
C=B×E ×S
Maximum number of unique memory address
Number of set index bits
Number of block offset bits
Number of tag bits
Cache size (bytes) not including overhead
such as the valid and tag bits
12. 12
6.4.2 Direct-mapped cache
Figure 6.27 P490
• Simplest kind of cache
• Characterized by exactly one line per set.
valid
valid
valid
tag
tag
tag
• • •
set 0:
set 1:
set S-1:
E=1 lines per setcache block
cache block
cache block
13. 13
Accessing direct-mapped caches
Figure 6.28 P491
• Set selection
– Use the set index bits to determine the set of
interest
valid
valid
valid
tag
tag
tag
• • •
set 0:
set 1:
set S-1:
t bits s bits
0 0 0 0 1
0m-1
b bits
tag set indexblock offset
selected set
cache block
cache block
cache block
14. 14
Accessing direct-mapped caches
• Line matching and word extraction
– find a valid line in the selected set with a matching
tag (line matching)
– then extract the word (word selection)
15. 15
Accessing direct-mapped caches
Figure 6.29 P491
1
t bits s bits
100i0110
0m-1
b bits
tag set index block offset
selected set (i):
=1?
= ?
(3) If (1) and (2), then
cache hit,
and block offset
selects
starting byte.
(1) The valid bit must be set
(2) The tag bits in the cache
line must match the
tag bits in the address
0110 w3w0 w1 w2
30 1 2 74 5 6
16. 16
Line Replacement on Misses in Directed Caches
• If cache misses
– Retrieve the requested block from the next level in
the memory hierarchy
– Store the new block in one of the cache lines of
the set indicated by the set index bits
17. 17
Line Replacement on Misses in Directed Caches
• If the set is full of valid cache lines
– One of the existing lines must be evicted
• For a direct-mapped caches
– Each set contains only one line
– Current line is replaced by the newly fetched line
21. 21
Why use middle bits as index?
• High-Order Bit Indexing
– Adjacent memory lines would
map to same cache entry
– Poor use of spatial locality
• Middle-Order Bit Indexing
– Consecutive memory lines
map to different cache lines
– Can hold C-byte region of
address space in cache at one
time
4-line Cache High-Order
Bit Indexing
Middle-Order
Bit Indexing
00
01
10
11
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Figure 6.31 P497
22. 22
6.4.3 Set associative caches
• Characterized by more than one line per set
valid tag
set 0: E=2 lines per set
set 1:
set S-1:
• • •
cache block
valid tag cache block
valid tag cache block
valid tag cache block
valid tag cache block
valid tag cache block
Figure 6.32 P498
23. 23
Accessing set associative caches
• Set selection
– identical to direct-mapped cache
valid
valid
tag
tag
set 0:
valid
valid
tag
tag
set 1:
valid
valid
tag
tag
set S-1:
• • •
t bits s bits
0 0 0 0 1
0m-1
b bits
tag set index block offset
Selected set
cache block
cache block
cache block
cache block
cache block
cache block
Figure 6.33 P498
24. 24
Accessing set associative caches
• Line matching and word selection
– must compare the tag in each valid line in the
selected set.
(3) If (1) and (2), then
cache hit, and
block offset selects
starting byte.
1 0110 w3w0 w1 w2
1 1001
t bits s bits
100i0110
0m-1
b bits
tag set index block offset
selected set (i):
=1?
= ?
(2) The tag bits in one
of the cache lines must
match the tag bits in
the address
(1) The valid bit must be set.
30 1 2 74 5 6
Figure 6.34 P499
25. 25
6.4.4 Fully associative caches
• Characterized by all of the lines in the only
one set
• No set index bits in the address
set 0:
valid
valid
tag
tag
cache block
cache block
valid tag cache block
…
E=C/B lines in
the one and only set
t bits b bits
tag block offset
Figure 6.36 P500
Figure 6.35 P500
26. 26
Accessing set associative caches
• Word selection
– must compare the tag in each valid line
0 0110
w3w0 w1 w2
1 1001
t bits
1000110
0m-1
b bits
tag block offset
=1?
= ?
(3) If (1) and (2), then
cache hit, and
block offset selects
starting byte.
(2) The tag bits in one
of the cache lines must
match the tag bits in
the address
(1) The valid bit must be set.
30 1 2 74 5 6
1
0
0110
1110
Figure 6.37 P500
27. 27
6.4.5 Issues with Writes
• Write hits
– 1) Write through
• Cache updates its copy
• Immediately writes the corresponding cache block to
memory
– 2) Write back
• Defers the memory update as long as possible
• Writing the updated block to memory only when it is
evicted from the cache
• Maintains a dirty bit for each cache line
28. 28
Issues with Writes
• Write misses
– 1) Write-allocate
• Loads the corresponding memory block into the cache
• Then updates the cache block
– 2) No-write-allocate
• Bypasses the cache
• Writes the word directly to memory
• Combination
– Write through, no-write-allocate
– Write back, write-allocate
29. 29
6.4.6 Multi-level caches
size:
speed:
$/Mbyte:
line size:
8-64 KB
3 ns
32 B
128 MB DRAM
60 ns
$1.50/MB
8 KB
30 GB
8 ms
$0.05/MB
larger, slower, cheaper
MemoryMemory diskdisk
TLB
L1 I-cache
L1 D-cacheregs
L2
Cache
Processor
1-4MB SRAM
6 ns
$100/MB
32 B
larger line size, higher associativity, more likely to write back
Options: separate data and instruction caches, or a unified cache
Figure 6.38 P504
30. 30
6.4.7 Cache performance metrics P505
• Miss Rate
– fraction of memory references not found in cache
(misses/references)
– Typical numbers:
3-10% for L1
• Hit Rate
– fraction of memory references found in cache (1 -
miss rate)
31. 31
Cache performance metrics
• Hit Time
– time to deliver a line in the cache to the processor
(includes time to determine whether the line is in
the cache)
– Typical numbers:
1-2 clock cycle for L1
5-10 clock cycles for L2
• Miss Penalty
– additional time required because of a miss
• Typically 25-100 cycles for main memory
32. 32
Cache performance metrics P505
• 1> Cache size
– Hit rate vs. hit time
• 2> Block size
– Spatial locality vs. temporal locality
• 3> Associativity
– Thrashing
– Cost
– Speed
– Miss penalty
• 4> Write strategy
– Simple, read misses, fewer transfer
34. 34
Writing Cache-Friendly Code
• Principles
– Programs with better locality will tend to have
lower miss rates
– Programs with lower miss rates will tend to run
faster than programs with higher miss rates
35. 35
Writing Cache-Friendly Code
• Basic approach
– Make the common case go fast
• Programs often spend most of their time in a few core
functions.
• These functions often spend most of their time in a few
loops
– Minimize the number of cache misses in each inner
loop
• All things being equal
36. 36
Writing Cache-Friendly Code P507
8[h]7[h]6[h]5[m]4[h]3[h]2[h]1[m]Access order,
[h]it or [m]iss
i= 7i= 6i= 5i= 4i= 3i= 2i= 1i=0v[i]
Temporal locality,
These variables are usually put in registersint sumvec(int v[N])
{
int i, sum = 0 ;
for (i = 0 ; i < N ; i++)
sum += v[i] ;
return sum ;
}
37. 37
Writing cache-friendly code
• Temporal locality
– Repeated references to local variables are good
because the compiler can cache them in the
register file
38. 38
Writing cache-friendly code
• Spatial locality
– Stride-1 references patterns are good because
caches at all levels of the memory hierarchy store
data as contiguous blocks
• Spatial locality is especially important in
programs that operate on multidimensional
arrays
39. 39
Writing cache-friendly code P508
• Example (M=4, N=8, 10cycles/iter)
int sumvec(int a[M][N])
{
int i, j, sum = 0 ;
for (i = 0 ; i < M ; i++)
for ( j = 0 ; j < N ; j++ )
sum += a[i][j] ;
return sum ;
}
41. 41
Writing cache-friendly code P508
• Example (M=4, N=8, 20cycles/iter)
int sumvec(int v[M][N])
{
int i, j, sum = 0 ;
for ( j = 0 ; j < N ; j++ )
for ( i = 0 ; i < M ; i++ )
sum += v[i][j] ;
return sum ;
}
43. 43
6.6 Putting it Together: The Impact of
Caches on Program Performance
6.6.1 The Memory Mountain
44. 44
The Memory Mountain P512
• Read throughput (read bandwidth)
– The rate that a program reads data from the
memory system
• Memory mountain
– A two-dimensional function of read bandwidth
versus temporal and spatial locality
– Characterizes the capabilities of the memory
system for each computer
45. 45
Memory mountain main routine
Figure 6.41 P513
/* mountain.c - Generate the memory mountain. */
#define MINBYTES (1 << 10) /* Working set size ranges from 1 KB */
#define MAXBYTES (1 << 23) /* ... up to 8 MB */
#define MAXSTRIDE 16 /* Strides range from 1 to 16 */
#define MAXELEMS MAXBYTES/sizeof(int)
int data[MAXELEMS]; /* The array we'll be traversing */
46. 46
Memory mountain main routine
int main()
{
int size; /* Working set size (in bytes) */
int stride; /* Stride (in array elements) */
double Mhz; /* Clock frequency */
init_data(data, MAXELEMS); /* Initialize each element in data to 1 */
Mhz = mhz(0); /* Estimate the clock frequency */
48. 48
Memory mountain test function
Figure 6.40 P512
/* The test function */
void test (int elems, int stride) {
int i, result = 0;
volatile int sink;
for (i = 0; i < elems; i += stride)
result += data[i];
sink = result; /* So compiler doesn't optimize away the loop */
}
49. 49
Memory mountain test function
/* Run test (elems, stride) and return read throughput (MB/s) */
double run (int size, int stride, double Mhz)
{
double cycles;
int elems = size / sizeof(int);
test (elems, stride); /* warm up the cache */
cycles = fcyc2(test, elems, stride, 0); /* call test (elems,stride) */
return (size / stride) / (cycles / Mhz); /* convert cycles to MB/s */
}
50. 50
The Memory Mountain
• Data
– Size
• MAXBYTES(8M) bytes or MAXELEMS(2M) words
– Partially accessed
• Working set: from 8MB to 1KB
• Stride: from 1 to 16
51. 51
The Memory Mountain
Figure 6.42 P514
s1
s3
s5
s7
s9
s11
s13
s15
8m
2m
512k
128k
32k
8k
2k
0
200
400
600
800
1000
1200
readthroughput(MB/s)
stride (words) working set size (bytes)
Pentium III Xeon
550 MHz
16 KB on-chip L1 d-cache
16 KB on-chip L1 i-cache
512 KB off-chip unified
L2 cache
Ridges of
Temporal
Locality
L1
L2
mem
Slopes of
Spatial
Locality
xe
52. 52
Ridges of temporal locality
• Slice through the memory mountain with
stride=1
– illuminates read throughputs of different caches
and memory
Ridges: 山脊
53. 53
Ridges of temporal locality
Figure 6.43 P515
0
200
400
600
800
1000
1200 8m
4m
2m
1024k
512k
256k
128k
64k
32k
16k
8k
4k
2k
1k
working set size (bytes)
readthrougput(MB/s)
L1 cache
region
L2 cache
region
main memory
region
54. 54
A slope of spatial locality
• Slice through memory mountain with
size=256KB
– shows cache block size.
55. 55
A slope of spatial locality
Figure 6.44 P516
0
100
200
300
400
500
600
700
800
s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 s16
stride (words)
readthroughput(MB/s)
one access per cache line