SlideShare a Scribd company logo
1
Intro and Snooping Protocols
• Topics: multi-core cache organizations, programming
models, cache coherence (snooping-based)
2
Multi-Core Cache Organizations
P
C
P
C
P
C
P
C
P
C
P
C
P
C
P
C
CCC CCC
CCC CCC
Private L1 caches
Shared L2 cache
Bus between L1s and single L2 cache controller
Snooping-based coherence between L1s
3
Multi-Core Cache Organizations
Private L1 caches
Shared L2 cache, but physically distributed
Scalable network
Directory-based coherence between L1s
P
C
P
C
P
C
P
C
P
C
P
C
P
C
P
C
4
Multi-Core Cache Organizations
Private L1 caches
Shared L2 cache, but physically distributed
Bus connecting the four L1s and four L2 banks
Snooping-based coherence between L1s
P
C
P
C
P
C
P
C
5
Multi-Core Cache Organizations
Private L1 caches
Private L2 caches
Scalable network
Directory-based coherence between L2s
(through a separate directory)
P
C
P
C
P
C
P
C
P
C
P
C
P
C
P
C
D
6
Shared-Memory Vs. Message Passing
• Shared-memory
 single copy of (shared) data in memory
 threads communicate by reading/writing to a shared
location
• Message-passing
 each thread has a copy of data in its own private
memory that other threads cannot access
 threads communicate by passing values with SEND/
RECEIVE message pairs
7
Ocean Kernel
Procedure Solve(A)
begin
diff = done = 0;
while (!done) do
diff = 0;
for i  1 to n do
for j  1 to n do
temp = A[i,j];
A[i,j]  0.2 * (A[i,j] + neighbors);
diff += abs(A[i,j] – temp);
end for
end for
if (diff < TOL) then done = 1;
end while
end procedure
8
Shared Address Space Model
int n, nprocs;
float **A, diff;
LOCKDEC(diff_lock);
BARDEC(bar1);
main()
begin
read(n); read(nprocs);
A  G_MALLOC();
initialize (A);
CREATE (nprocs,Solve,A);
WAIT_FOR_END (nprocs);
end main
procedure Solve(A)
int i, j, pid, done=0;
float temp, mydiff=0;
int mymin = 1 + (pid * n/procs);
int mymax = mymin + n/nprocs -1;
while (!done) do
mydiff = diff = 0;
BARRIER(bar1,nprocs);
for i  mymin to mymax
for j  1 to n do
…
endfor
endfor
LOCK(diff_lock);
diff += mydiff;
UNLOCK(diff_lock);
BARRIER (bar1, nprocs);
if (diff < TOL) then done = 1;
BARRIER (bar1, nprocs);
endwhile
9
Message Passing Model
main()
read(n); read(nprocs);
CREATE (nprocs-1, Solve);
Solve();
WAIT_FOR_END (nprocs-1);
procedure Solve()
int i, j, pid, nn = n/nprocs, done=0;
float temp, tempdiff, mydiff = 0;
myA  malloc(…)
initialize(myA);
while (!done) do
mydiff = 0;
if (pid != 0)
SEND(&myA[1,0], n, pid-1, ROW);
if (pid != nprocs-1)
SEND(&myA[nn,0], n, pid+1, ROW);
if (pid != 0)
RECEIVE(&myA[0,0], n, pid-1, ROW);
if (pid != nprocs-1)
RECEIVE(&myA[nn+1,0], n, pid+1, ROW);
for i  1 to nn do
for j  1 to n do
…
endfor
endfor
if (pid != 0)
SEND(mydiff, 1, 0, DIFF);
RECEIVE(done, 1, 0, DONE);
else
for i  1 to nprocs-1 do
RECEIVE(tempdiff, 1, *, DIFF);
mydiff += tempdiff;
endfor
if (mydiff < TOL) done = 1;
for i  1 to nprocs-1 do
SEND(done, 1, I, DONE);
endfor
endif
endwhile
10
Models for SEND and RECEIVE
• Synchronous: SEND returns control back to the program
only when the RECEIVE has completed
• Blocking Asynchronous: SEND returns control back to the
program after the OS has copied the message into its space
-- the program can now modify the sent data structure
• Nonblocking Asynchronous: SEND and RECEIVE return
control immediately – the message will get copied at some
point, so the process must overlap some other computation
with the communication – other primitives are used to
probe if the communication has finished or not
11
Deterministic Execution
• Need synch after every anti-diagonal
• Potential load imbalance
• Shared-memory vs. message passing
• Function of the model for SEND-RECEIVE
• Function of the algorithm: diagonal, red-black ordering
12
Cache Coherence
A multiprocessor system is cache coherent if
• a value written by a processor is eventually visible to
reads by other processors – write propagation
• two writes to the same location by two processors are
seen in the same order by all processors – write
serialization
13
Cache Coherence Protocols
• Directory-based: A single location (directory) keeps track
of the sharing status of a block of memory
• Snooping: Every cache block is accompanied by the sharing
status of that block – all cache controllers monitor the
shared bus so they can update the sharing status of the
block, if necessary
 Write-invalidate: a processor gains exclusive access of
a block before writing by invalidating all other copies
 Write-update: when a processor writes, it updates other
shared copies of that block
14
Protocol-I MSI
• 3-state write-back invalidation bus-based snooping protocol
• Each block can be in one of three states – invalid, shared,
modified (exclusive)
• A processor must acquire the block in exclusive state in
order to write to it – this is done by placing an exclusive
read request on the bus – every other cached copy is
invalidated
• When some other processor tries to read an exclusive
block, the block is demoted to shared
15
Design Issues, Optimizations
• When does memory get updated?
 demotion from modified to shared?
 move from modified in one cache to modified in another?
• Who responds with data? – memory or a cache that has
the block in exclusive state – does it help if sharers respond?
• We can assume that bus, memory, and cache state
transactions are atomic – if not, we will need more states
• A transition from shared to modified only requires an upgrade
request and no transfer of data
• Is the protocol simpler for a write-through cache?
16
4-State Protocol
• Multiprocessors execute many single-threaded programs
• A read followed by a write will generate bus transactions
to acquire the block in exclusive state even though there
are no sharers
• Note that we can optimize protocols by adding more
states – increases design/verification complexity
17
MESI Protocol
• The new state is exclusive-clean – the cache can service
read requests and no other cache has the same block
• When the processor attempts a write, the block is
upgraded to exclusive-modified without generating a bus
transaction
• When a processor makes a read request, it must detect
if it has the only cached copy – the interconnect must
include an additional signal that is asserted by each
cache if it has a valid copy of the block
18
Design Issues
• When caches evict blocks, they do not inform other
caches – it is possible to have a block in shared state
even though it is an exclusive-clean copy
• Cache-to-cache sharing: SRAM vs. DRAM latencies,
contention in remote caches, protocol complexities
(memory has to wait, which cache responds), can be
especially useful in distributed memory systems
• The protocol can be improved by adding a fifth
state (owner – MOESI) – the owner services reads
(instead of memory)
19
Update Protocol (Dragon)
• 4-state write-back update protocol, first used in the
Dragon multiprocessor (1984)
• Write-back update is not the same as write-through –
on a write, only caches are updated, not memory
• Goal: writes may usually not be on the critical path, but
subsequent reads may be
20
4 States
• No invalid state
• Modified and Exclusive-clean as before: used when there
is a sole cached copy
• Shared-clean: potentially multiple caches have this block
and main memory may or may not be up-to-date
• Shared-modified: potentially multiple caches have this
block, main memory is not up-to-date, and this cache
must update memory – only one block can be in Sm state
• In reality, one state would have sufficed – more states
to reduce traffic
21
Design Issues
• If the update is also sent to main memory, the Sm
state can be eliminated
• If all caches are informed when a block is evicted, the
block can be moved from shared to M or E – this can
help save future bus transactions
• Having an extra wire to determine exclusivity seems
like a worthy trade-off in update systems
22
Examples
P1 P2
MSI MESI Dragon MSI MESI Dragon
• P1: Rd X
• P1: Wr X
• P2: Rd X
• P1: Wr X
• P1: Wr X
• P2: Rd X
• P2: Wr X
Total transfers:

More Related Content

What's hot

Cache coherence problem and its solutions
Cache coherence problem and its solutionsCache coherence problem and its solutions
Cache coherence problem and its solutions
Majid Saleem
 
Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1Mr SMAK
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
Siddhartha Anand
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
Haris456
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory Multiprocessors
Salvatore La Bua
 
Hardware multithreading
Hardware multithreadingHardware multithreading
Hardware multithreading
Fraboni Ec
 
Hardware Multi-Threading
Hardware Multi-ThreadingHardware Multi-Threading
Hardware Multi-Threading
babuece
 
What is simultaneous multithreading
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreading
Fraboni Ec
 
VLIW Processors
VLIW ProcessorsVLIW Processors
VLIW Processors
Sudhanshu Janwadkar
 
Multithreading models
Multithreading modelsMultithreading models
Multithreading modelsanoopkrishna2
 
Multithreading
Multithreading Multithreading
Multithreading
WafaQKhan
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
A B Shinde
 
Cache design
Cache design Cache design
Cache design
maamir farooq
 
Multicore Processors
Multicore ProcessorsMulticore Processors
Multicore Processors
Smruti Sarangi
 
Os Threads
Os ThreadsOs Threads
Os Threads
Salman Memon
 
multi-threading
multi-threadingmulti-threading
multi-threading
Ezzat Gul
 
Linux kernel module programming guide
Linux kernel module programming guideLinux kernel module programming guide
Linux kernel module programming guide
Dũng Nguyễn
 

What's hot (20)

Cache coherence problem and its solutions
Cache coherence problem and its solutionsCache coherence problem and its solutions
Cache coherence problem and its solutions
 
Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1
 
Memory models
Memory modelsMemory models
Memory models
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory Multiprocessors
 
Hardware multithreading
Hardware multithreadingHardware multithreading
Hardware multithreading
 
Hardware Multi-Threading
Hardware Multi-ThreadingHardware Multi-Threading
Hardware Multi-Threading
 
What is simultaneous multithreading
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreading
 
VLIW Processors
VLIW ProcessorsVLIW Processors
VLIW Processors
 
Multithreading models
Multithreading modelsMultithreading models
Multithreading models
 
Multithreading
Multithreading Multithreading
Multithreading
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
Cache design
Cache design Cache design
Cache design
 
Multicore Processors
Multicore ProcessorsMulticore Processors
Multicore Processors
 
Os Threads
Os ThreadsOs Threads
Os Threads
 
4 threads
4 threads4 threads
4 threads
 
multi-threading
multi-threadingmulti-threading
multi-threading
 
Lecture2
Lecture2Lecture2
Lecture2
 
Linux kernel module programming guide
Linux kernel module programming guideLinux kernel module programming guide
Linux kernel module programming guide
 

Viewers also liked

Client Server System Development
Client Server System DevelopmentClient Server System Development
Client Server System Development
ManjuShanmugam1593
 
Clientserver Presentation
Clientserver PresentationClientserver Presentation
Clientserver PresentationTuhin_Das
 
Client Server Architecture
Client Server ArchitectureClient Server Architecture
Client Server Architecturesuks_87
 

Viewers also liked (6)

Client Server System Development
Client Server System DevelopmentClient Server System Development
Client Server System Development
 
Email Client Server System
Email Client Server SystemEmail Client Server System
Email Client Server System
 
Slide05 Message Passing Architecture
Slide05 Message Passing ArchitectureSlide05 Message Passing Architecture
Slide05 Message Passing Architecture
 
Clientserver Presentation
Clientserver PresentationClientserver Presentation
Clientserver Presentation
 
Client Server Architecture
Client Server ArchitectureClient Server Architecture
Client Server Architecture
 
Client server architecture
Client server architectureClient server architecture
Client server architecture
 

Similar to Snooping 2

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
ssuser5c9d4b1
 
Introduction 1
Introduction 1Introduction 1
Introduction 1
Yasir Khan
 
27 multicore
27 multicore27 multicore
27 multicore
ssuser47ae65
 
27 multicore
27 multicore27 multicore
27 multicore
Rishabh Jain
 
Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCD
Prashant Rane
 
Memory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdfMemory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdf
rajaratna4
 
22CS201 COA
22CS201 COA22CS201 COA
22CS201 COA
Kathirvel Ayyaswamy
 
Cache Coherence.pptx
Cache Coherence.pptxCache Coherence.pptx
Cache Coherence.pptx
SamyakJain710491
 
7. Memory management in operating system.ppt
7. Memory management in operating system.ppt7. Memory management in operating system.ppt
7. Memory management in operating system.ppt
imrank39199
 
cache memory
 cache memory cache memory
cache memory
NAHID HASAN
 
Operating systems- Main Memory Management
Operating systems- Main Memory ManagementOperating systems- Main Memory Management
Operating systems- Main Memory Management
Chandrakant Divate
 
Linux Memory Analysis with Volatility
Linux Memory Analysis with VolatilityLinux Memory Analysis with Volatility
Linux Memory Analysis with Volatility
Andrew Case
 
Memory Management.pdf
Memory Management.pdfMemory Management.pdf
Memory Management.pdf
SujanTimalsina5
 
Ch8 main memory
Ch8   main memoryCh8   main memory
Ch8 main memory
Welly Dian Astika
 
module4.ppt
module4.pptmodule4.ppt
module4.ppt
Subhasis Dash
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NL
Thijs Terlouw
 
Architecture of cash memory for engineering ch 3.pptx
Architecture of cash memory  for engineering ch 3.pptxArchitecture of cash memory  for engineering ch 3.pptx
Architecture of cash memory for engineering ch 3.pptx
magedsrhan773
 
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Heechul Yun
 

Similar to Snooping 2 (20)

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Introduction 1
Introduction 1Introduction 1
Introduction 1
 
12-6810-12.ppt
12-6810-12.ppt12-6810-12.ppt
12-6810-12.ppt
 
27 multicore
27 multicore27 multicore
27 multicore
 
27 multicore
27 multicore27 multicore
27 multicore
 
Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCD
 
Memory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdfMemory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdf
 
22CS201 COA
22CS201 COA22CS201 COA
22CS201 COA
 
Cache Coherence.pptx
Cache Coherence.pptxCache Coherence.pptx
Cache Coherence.pptx
 
7. Memory management in operating system.ppt
7. Memory management in operating system.ppt7. Memory management in operating system.ppt
7. Memory management in operating system.ppt
 
cache memory
 cache memory cache memory
cache memory
 
Operating systems- Main Memory Management
Operating systems- Main Memory ManagementOperating systems- Main Memory Management
Operating systems- Main Memory Management
 
Linux Memory Analysis with Volatility
Linux Memory Analysis with VolatilityLinux Memory Analysis with Volatility
Linux Memory Analysis with Volatility
 
Memory Management.pdf
Memory Management.pdfMemory Management.pdf
Memory Management.pdf
 
Ch8 main memory
Ch8   main memoryCh8   main memory
Ch8 main memory
 
module4.ppt
module4.pptmodule4.ppt
module4.ppt
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NL
 
Architecture of cash memory for engineering ch 3.pptx
Architecture of cash memory  for engineering ch 3.pptxArchitecture of cash memory  for engineering ch 3.pptx
Architecture of cash memory for engineering ch 3.pptx
 
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
 
1083 wang
1083 wang1083 wang
1083 wang
 

More from Yasir Khan

Lecture 6
Lecture 6Lecture 6
Lecture 6
Yasir Khan
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
Yasir Khan
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
Yasir Khan
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
Yasir Khan
 
Lec#1
Lec#1Lec#1
Lec#1
Yasir Khan
 
Ch10 (1)
Ch10 (1)Ch10 (1)
Ch10 (1)
Yasir Khan
 
Ch09
Ch09Ch09
Ch05
Ch05Ch05
Hpc sys
Hpc sysHpc sys
Hpc sys
Yasir Khan
 
Hpc 6 7
Hpc 6 7Hpc 6 7
Hpc 6 7
Yasir Khan
 
Hpc 4 5
Hpc 4 5Hpc 4 5
Hpc 4 5
Yasir Khan
 
Hpc 3
Hpc 3Hpc 3
Hpc 3
Yasir Khan
 
Hpc 2
Hpc 2Hpc 2
Hpc 2
Yasir Khan
 
Hpc 1
Hpc 1Hpc 1
Hpc 1
Yasir Khan
 
Flynns classification
Flynns classificationFlynns classification
Flynns classification
Yasir Khan
 
Dir based imp_5
Dir based imp_5Dir based imp_5
Dir based imp_5
Yasir Khan
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 

More from Yasir Khan (20)

Lecture 6
Lecture 6Lecture 6
Lecture 6
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Lec#1
Lec#1Lec#1
Lec#1
 
Ch10 (1)
Ch10 (1)Ch10 (1)
Ch10 (1)
 
Ch09
Ch09Ch09
Ch09
 
Ch05
Ch05Ch05
Ch05
 
Hpc sys
Hpc sysHpc sys
Hpc sys
 
Hpc 6 7
Hpc 6 7Hpc 6 7
Hpc 6 7
 
Hpc 4 5
Hpc 4 5Hpc 4 5
Hpc 4 5
 
Hpc 3
Hpc 3Hpc 3
Hpc 3
 
Hpc 2
Hpc 2Hpc 2
Hpc 2
 
Hpc 1
Hpc 1Hpc 1
Hpc 1
 
Flynns classification
Flynns classificationFlynns classification
Flynns classification
 
Dir based imp_5
Dir based imp_5Dir based imp_5
Dir based imp_5
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Uncertainity
Uncertainity Uncertainity
Uncertainity
 
Logic
LogicLogic
Logic
 
M6 game
M6 gameM6 game
M6 game
 

Recently uploaded

Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 

Recently uploaded (20)

Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 

Snooping 2

  • 1. 1 Intro and Snooping Protocols • Topics: multi-core cache organizations, programming models, cache coherence (snooping-based)
  • 2. 2 Multi-Core Cache Organizations P C P C P C P C P C P C P C P C CCC CCC CCC CCC Private L1 caches Shared L2 cache Bus between L1s and single L2 cache controller Snooping-based coherence between L1s
  • 3. 3 Multi-Core Cache Organizations Private L1 caches Shared L2 cache, but physically distributed Scalable network Directory-based coherence between L1s P C P C P C P C P C P C P C P C
  • 4. 4 Multi-Core Cache Organizations Private L1 caches Shared L2 cache, but physically distributed Bus connecting the four L1s and four L2 banks Snooping-based coherence between L1s P C P C P C P C
  • 5. 5 Multi-Core Cache Organizations Private L1 caches Private L2 caches Scalable network Directory-based coherence between L2s (through a separate directory) P C P C P C P C P C P C P C P C D
  • 6. 6 Shared-Memory Vs. Message Passing • Shared-memory  single copy of (shared) data in memory  threads communicate by reading/writing to a shared location • Message-passing  each thread has a copy of data in its own private memory that other threads cannot access  threads communicate by passing values with SEND/ RECEIVE message pairs
  • 7. 7 Ocean Kernel Procedure Solve(A) begin diff = done = 0; while (!done) do diff = 0; for i  1 to n do for j  1 to n do temp = A[i,j]; A[i,j]  0.2 * (A[i,j] + neighbors); diff += abs(A[i,j] – temp); end for end for if (diff < TOL) then done = 1; end while end procedure
  • 8. 8 Shared Address Space Model int n, nprocs; float **A, diff; LOCKDEC(diff_lock); BARDEC(bar1); main() begin read(n); read(nprocs); A  G_MALLOC(); initialize (A); CREATE (nprocs,Solve,A); WAIT_FOR_END (nprocs); end main procedure Solve(A) int i, j, pid, done=0; float temp, mydiff=0; int mymin = 1 + (pid * n/procs); int mymax = mymin + n/nprocs -1; while (!done) do mydiff = diff = 0; BARRIER(bar1,nprocs); for i  mymin to mymax for j  1 to n do … endfor endfor LOCK(diff_lock); diff += mydiff; UNLOCK(diff_lock); BARRIER (bar1, nprocs); if (diff < TOL) then done = 1; BARRIER (bar1, nprocs); endwhile
  • 9. 9 Message Passing Model main() read(n); read(nprocs); CREATE (nprocs-1, Solve); Solve(); WAIT_FOR_END (nprocs-1); procedure Solve() int i, j, pid, nn = n/nprocs, done=0; float temp, tempdiff, mydiff = 0; myA  malloc(…) initialize(myA); while (!done) do mydiff = 0; if (pid != 0) SEND(&myA[1,0], n, pid-1, ROW); if (pid != nprocs-1) SEND(&myA[nn,0], n, pid+1, ROW); if (pid != 0) RECEIVE(&myA[0,0], n, pid-1, ROW); if (pid != nprocs-1) RECEIVE(&myA[nn+1,0], n, pid+1, ROW); for i  1 to nn do for j  1 to n do … endfor endfor if (pid != 0) SEND(mydiff, 1, 0, DIFF); RECEIVE(done, 1, 0, DONE); else for i  1 to nprocs-1 do RECEIVE(tempdiff, 1, *, DIFF); mydiff += tempdiff; endfor if (mydiff < TOL) done = 1; for i  1 to nprocs-1 do SEND(done, 1, I, DONE); endfor endif endwhile
  • 10. 10 Models for SEND and RECEIVE • Synchronous: SEND returns control back to the program only when the RECEIVE has completed • Blocking Asynchronous: SEND returns control back to the program after the OS has copied the message into its space -- the program can now modify the sent data structure • Nonblocking Asynchronous: SEND and RECEIVE return control immediately – the message will get copied at some point, so the process must overlap some other computation with the communication – other primitives are used to probe if the communication has finished or not
  • 11. 11 Deterministic Execution • Need synch after every anti-diagonal • Potential load imbalance • Shared-memory vs. message passing • Function of the model for SEND-RECEIVE • Function of the algorithm: diagonal, red-black ordering
  • 12. 12 Cache Coherence A multiprocessor system is cache coherent if • a value written by a processor is eventually visible to reads by other processors – write propagation • two writes to the same location by two processors are seen in the same order by all processors – write serialization
  • 13. 13 Cache Coherence Protocols • Directory-based: A single location (directory) keeps track of the sharing status of a block of memory • Snooping: Every cache block is accompanied by the sharing status of that block – all cache controllers monitor the shared bus so they can update the sharing status of the block, if necessary  Write-invalidate: a processor gains exclusive access of a block before writing by invalidating all other copies  Write-update: when a processor writes, it updates other shared copies of that block
  • 14. 14 Protocol-I MSI • 3-state write-back invalidation bus-based snooping protocol • Each block can be in one of three states – invalid, shared, modified (exclusive) • A processor must acquire the block in exclusive state in order to write to it – this is done by placing an exclusive read request on the bus – every other cached copy is invalidated • When some other processor tries to read an exclusive block, the block is demoted to shared
  • 15. 15 Design Issues, Optimizations • When does memory get updated?  demotion from modified to shared?  move from modified in one cache to modified in another? • Who responds with data? – memory or a cache that has the block in exclusive state – does it help if sharers respond? • We can assume that bus, memory, and cache state transactions are atomic – if not, we will need more states • A transition from shared to modified only requires an upgrade request and no transfer of data • Is the protocol simpler for a write-through cache?
  • 16. 16 4-State Protocol • Multiprocessors execute many single-threaded programs • A read followed by a write will generate bus transactions to acquire the block in exclusive state even though there are no sharers • Note that we can optimize protocols by adding more states – increases design/verification complexity
  • 17. 17 MESI Protocol • The new state is exclusive-clean – the cache can service read requests and no other cache has the same block • When the processor attempts a write, the block is upgraded to exclusive-modified without generating a bus transaction • When a processor makes a read request, it must detect if it has the only cached copy – the interconnect must include an additional signal that is asserted by each cache if it has a valid copy of the block
  • 18. 18 Design Issues • When caches evict blocks, they do not inform other caches – it is possible to have a block in shared state even though it is an exclusive-clean copy • Cache-to-cache sharing: SRAM vs. DRAM latencies, contention in remote caches, protocol complexities (memory has to wait, which cache responds), can be especially useful in distributed memory systems • The protocol can be improved by adding a fifth state (owner – MOESI) – the owner services reads (instead of memory)
  • 19. 19 Update Protocol (Dragon) • 4-state write-back update protocol, first used in the Dragon multiprocessor (1984) • Write-back update is not the same as write-through – on a write, only caches are updated, not memory • Goal: writes may usually not be on the critical path, but subsequent reads may be
  • 20. 20 4 States • No invalid state • Modified and Exclusive-clean as before: used when there is a sole cached copy • Shared-clean: potentially multiple caches have this block and main memory may or may not be up-to-date • Shared-modified: potentially multiple caches have this block, main memory is not up-to-date, and this cache must update memory – only one block can be in Sm state • In reality, one state would have sufficed – more states to reduce traffic
  • 21. 21 Design Issues • If the update is also sent to main memory, the Sm state can be eliminated • If all caches are informed when a block is evicted, the block can be moved from shared to M or E – this can help save future bus transactions • Having an extra wire to determine exclusivity seems like a worthy trade-off in update systems
  • 22. 22 Examples P1 P2 MSI MESI Dragon MSI MESI Dragon • P1: Rd X • P1: Wr X • P2: Rd X • P1: Wr X • P1: Wr X • P2: Rd X • P2: Wr X Total transfers: