SlideShare a Scribd company logo
1 of 14
Implementing of Non-Uniform
Memory Access (NUMA) Systems
Project Submitted
by
PALLAB KUMAR RAY
(ME 2014-10001)
Under the supervision of
Mr. Somak Das
(Dept. of CSE/IT)
INTRODUCTION
• In a shared memory multiprocessor, all main memory is accessible to and shared
by all processors. The cost of accessing shared memory is the same for all
processors. In this case, from a memory access viewpoint they are called UMA or
Uniform Memory Access Systems.
• A particular category of shared memory multiprocessor is NUMA or Non-Uniform
Memory Access. It is a shared memory architecture that describes the placement
of main memory modules with respect to processors in a multiprocessor system.
Like most every other processor architectural feature, ignorance of NUMA can
result in sub-par application memory performance.
NUMA
In the NUMA shared memory architecture, each processor has its own local memory module that it
can access directly with a distinctive performance advantage. At the same time, it can also access
any memory module belonging to another processor using a shared bus (or some other type of
interconnect) as seen in the diagram
Shared Memory
In contrast to "shared nothing" architectures, memory is globally accessible under shared memory. Communication is
anonymous; there is no explicit recipient of a shared memory access, as in message passing, and processors may
communicate without necessarily being aware of one another. Shared memory provides 2 services:
a. Direct access to another processor's local memory.
b. Automatic address mapping of a (virtual) memory address onto a (processor, local memory address) pair.
1. Convergence of parallel architectures :- While cc-NUMA architectures add specialized support for shared
memory, e.g. coherence control, they still rely on fine-grained message passing involving short messages. So do
single sided architectures. So it appears that designs are converging, with the important detailed handled through
a combination of software and specialized support.
2. The Cache Coherence Problem:- Owing to the use of cache memories in modern computer architectures, shared
memory introduces the cache coherence problem. Cache coherence arises with shared data that is to be written
and read. If one processor modifies a shared cached value, then the other processor(s) must get the latest value.
Coherence says nothing about when changes propagate through the memory sub system, only that they will
eventually happen. Other steps must be take (usually in software) to avoid race conditions could lead to non -
deterministic program behavior.
a. Program order :- If a processor writes and then reads the same location X, and there are no other intervening writes by other
processors to X , then the read will always return the value previously written.
b. Definition of a coherent view of memory :- If a processor P reads from location X that was previously written by a processor Q,
then the read will return the value previously written, if a sufficient amount of time has elapsed between the read and the write.
c. Serialization of writes :- Multiple writes to a location X happen sequentially. If two processors write to the same location, then
other processors reading X will observe the same the sequence of values in the order written. If a 10 and then a 20 is written to X, then
it is not possible for any processor to read 20 then 10.
3. Managing coherence:- There are two major strategies for managing coherence
a. Snooping protocol. In this bus-based scheme processors passively listen for bus activity, updating or invalidating cache entries as
necessary. The scheme is ultimately non-scalable, and isn't appropriate for machine with tens of processors or more.
b. Directory-based. This is a scalable scheme employing point-to-point messages to handle coherence. A memory structure called
a directory maintains information about data sharing. This scheme was first applied to cache-coherent multiprocessors by the DASH
project at Stanford and is used in the SGI Altix 3000, which is a CC-NUMA architecture. NUMA stands for Non Uniform Memory Access
time to memory. Depending on the location of the processor and the address accessed.
The 3 goals of DASH are :-
• Scalable memory bandwidth
• Scalable cost (use commodity parts)
• Deal with large memory latencies
Memory Architecture block diagram
the memory unit has 8-bit address bus, two 8-bit data buses – one for input and the one for output, a clock –
input and 1-bit write-enable input.
When the write_enable is set HIGH (1), the incoming data (coming through the data_in bus) is first store in the
memory address specified by the address bus, and then the newly written data is fetched from the same address
and is output through the data_outbus.
Circuit diagram
In this chip write_enable that is a input . Three more inputs are connectd to the chip. Those are the clk , the clk is
HIGH all the time another is 8-bit data input bus, last one is 8-bit address bus.
When the data input in the memory , the write_enable is HIGH. Data pass through the chip and the data stored in the
address bus. When data is fetched , the output data is store in the data_out bus. That time the clk is high or enable.
Memory chip algorithm
• This is the algorithm use for the memory chip
begin
if (write_enable)
begin
memory[address] <= data_in;
end
data_out <= memory[address];
An Example of the Memory Architecture
When the Right Enable is LOW
At this time CLk = (high)
Data_in = X (don’t care)
Address = (A6)H [10100110]
Write_enable = 0
Then the
Data_out = (8B)H [10001011]
Data_out = memory [A6]
this is using by this
data_out <= memory [address];
this is diagram for the write _enable is low and how it works,
When the Right Enable is HIGH
At this time CLk = high
Data_in = 9FH [10011111]
Address = (A6)H [10100110]
Write_enable = 1
clk =when High
low
Then the
Mem[A6] = 9F (data_in)
Data_out = mem[A6]
Data_out = 9F(10011111)
this is using by this
data_out <= memory [address];
Here when the Right Enable is High the DATA store in A6 address that is changed. The DATA is 9FH
that is the present Data in Address A6.
Simulation Architecture of memory chip
Output
References
[1] https://books.google.co.in /books NUMA coding for VLSI and Source
code.
[2] “Introduction to NUMA architecture with shared memory.”
https://computing.llnl.gov/tutorials/parallel_comp
[3] Intel in NUMA
[4]“UMA-NUMA Scalability”
www.cs.drexel.edu/~wmm24/cs281/lectures/ppt/cs282_lec12.ppt
THANK YOU

More Related Content

What's hot

Amoeba distributed operating System
Amoeba distributed operating SystemAmoeba distributed operating System
Amoeba distributed operating SystemSaurabh Gupta
 
Parallel programming model, language and compiler in ACA.
Parallel programming model, language and compiler in ACA.Parallel programming model, language and compiler in ACA.
Parallel programming model, language and compiler in ACA.MITS Gwalior
 
Processes and Processors in Distributed Systems
Processes and Processors in Distributed SystemsProcesses and Processors in Distributed Systems
Processes and Processors in Distributed SystemsDr Sandeep Kumar Poonia
 
Multiprocessor Architecture (Advanced computer architecture)
Multiprocessor Architecture  (Advanced computer architecture)Multiprocessor Architecture  (Advanced computer architecture)
Multiprocessor Architecture (Advanced computer architecture)vani261
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingAkhila Prabhakaran
 
Distributed shared memory shyam soni
Distributed shared memory shyam soniDistributed shared memory shyam soni
Distributed shared memory shyam soniShyam Soni
 
Computer architecture cache memory
Computer architecture cache memoryComputer architecture cache memory
Computer architecture cache memoryMazin Alwaaly
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating systemudaya khanal
 
Distributed Shared Memory Systems
Distributed Shared Memory SystemsDistributed Shared Memory Systems
Distributed Shared Memory SystemsAnkit Gupta
 
Parallel computing
Parallel computingParallel computing
Parallel computingVinay Gupta
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system modelHarshad Umredkar
 
File models and file accessing models
File models and file accessing modelsFile models and file accessing models
File models and file accessing modelsishmecse13
 
Multi processor scheduling
Multi  processor schedulingMulti  processor scheduling
Multi processor schedulingShashank Kapoor
 
Threads (operating System)
Threads (operating System)Threads (operating System)
Threads (operating System)Prakhar Maurya
 
program partitioning and scheduling IN Advanced Computer Architecture
program partitioning and scheduling  IN Advanced Computer Architectureprogram partitioning and scheduling  IN Advanced Computer Architecture
program partitioning and scheduling IN Advanced Computer ArchitecturePankaj Kumar Jain
 

What's hot (20)

Amoeba distributed operating System
Amoeba distributed operating SystemAmoeba distributed operating System
Amoeba distributed operating System
 
Parallel programming model, language and compiler in ACA.
Parallel programming model, language and compiler in ACA.Parallel programming model, language and compiler in ACA.
Parallel programming model, language and compiler in ACA.
 
Chap 4
Chap 4Chap 4
Chap 4
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
Underlying principles of parallel and distributed computing
Underlying principles of parallel and distributed computingUnderlying principles of parallel and distributed computing
Underlying principles of parallel and distributed computing
 
Processes and Processors in Distributed Systems
Processes and Processors in Distributed SystemsProcesses and Processors in Distributed Systems
Processes and Processors in Distributed Systems
 
Multiprocessor Architecture (Advanced computer architecture)
Multiprocessor Architecture  (Advanced computer architecture)Multiprocessor Architecture  (Advanced computer architecture)
Multiprocessor Architecture (Advanced computer architecture)
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Distributed shared memory shyam soni
Distributed shared memory shyam soniDistributed shared memory shyam soni
Distributed shared memory shyam soni
 
Computer architecture cache memory
Computer architecture cache memoryComputer architecture cache memory
Computer architecture cache memory
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating system
 
Distributed Shared Memory Systems
Distributed Shared Memory SystemsDistributed Shared Memory Systems
Distributed Shared Memory Systems
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system model
 
File models and file accessing models
File models and file accessing modelsFile models and file accessing models
File models and file accessing models
 
Multi processor scheduling
Multi  processor schedulingMulti  processor scheduling
Multi processor scheduling
 
Pram model
Pram modelPram model
Pram model
 
Threads (operating System)
Threads (operating System)Threads (operating System)
Threads (operating System)
 
program partitioning and scheduling IN Advanced Computer Architecture
program partitioning and scheduling  IN Advanced Computer Architectureprogram partitioning and scheduling  IN Advanced Computer Architecture
program partitioning and scheduling IN Advanced Computer Architecture
 

Similar to NUMA

Computer System Architecture
Computer System ArchitectureComputer System Architecture
Computer System ArchitectureBrenda Debra
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSINGADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING Zena Abo-Altaheen
 
Distributed system lectures
Distributed system lecturesDistributed system lectures
Distributed system lecturesmarwaeng
 
parallel computing.ppt
parallel computing.pptparallel computing.ppt
parallel computing.pptssuser413a98
 
Lecture 6
Lecture  6Lecture  6
Lecture 6Mr SMAK
 
Lecture 6
Lecture  6Lecture  6
Lecture 6Mr SMAK
 
Lecture 6
Lecture  6Lecture  6
Lecture 6Mr SMAK
 
multiprocessors and multicomputers
 multiprocessors and multicomputers multiprocessors and multicomputers
multiprocessors and multicomputersPankaj Kumar Jain
 
ARM architcture
ARM architcture ARM architcture
ARM architcture Hossam Adel
 
Distributed system notes unit I
Distributed system notes unit IDistributed system notes unit I
Distributed system notes unit INANDINI SHARMA
 
Multilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMultilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMahesh Kumar Attri
 
Computer Arithmetic and Processor Basics
Computer Arithmetic and Processor BasicsComputer Arithmetic and Processor Basics
Computer Arithmetic and Processor BasicsShinuMMAEI
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfKishaKiddo
 

Similar to NUMA (20)

Computer System Architecture
Computer System ArchitectureComputer System Architecture
Computer System Architecture
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSINGADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
 
Distributed system lectures
Distributed system lecturesDistributed system lectures
Distributed system lectures
 
Week5
Week5Week5
Week5
 
Cache memory
Cache memoryCache memory
Cache memory
 
parallel computing.ppt
parallel computing.pptparallel computing.ppt
parallel computing.ppt
 
Opetating System Memory management
Opetating System Memory managementOpetating System Memory management
Opetating System Memory management
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
 
multiprocessors and multicomputers
 multiprocessors and multicomputers multiprocessors and multicomputers
multiprocessors and multicomputers
 
Module 2.pdf
Module 2.pdfModule 2.pdf
Module 2.pdf
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
ARM architcture
ARM architcture ARM architcture
ARM architcture
 
Distributed system notes unit I
Distributed system notes unit IDistributed system notes unit I
Distributed system notes unit I
 
Multilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMultilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memory
 
Computer Arithmetic and Processor Basics
Computer Arithmetic and Processor BasicsComputer Arithmetic and Processor Basics
Computer Arithmetic and Processor Basics
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdf
 
Dosass2
Dosass2Dosass2
Dosass2
 

NUMA

  • 1. Implementing of Non-Uniform Memory Access (NUMA) Systems Project Submitted by PALLAB KUMAR RAY (ME 2014-10001) Under the supervision of Mr. Somak Das (Dept. of CSE/IT)
  • 2. INTRODUCTION • In a shared memory multiprocessor, all main memory is accessible to and shared by all processors. The cost of accessing shared memory is the same for all processors. In this case, from a memory access viewpoint they are called UMA or Uniform Memory Access Systems. • A particular category of shared memory multiprocessor is NUMA or Non-Uniform Memory Access. It is a shared memory architecture that describes the placement of main memory modules with respect to processors in a multiprocessor system. Like most every other processor architectural feature, ignorance of NUMA can result in sub-par application memory performance.
  • 3. NUMA In the NUMA shared memory architecture, each processor has its own local memory module that it can access directly with a distinctive performance advantage. At the same time, it can also access any memory module belonging to another processor using a shared bus (or some other type of interconnect) as seen in the diagram
  • 4. Shared Memory In contrast to "shared nothing" architectures, memory is globally accessible under shared memory. Communication is anonymous; there is no explicit recipient of a shared memory access, as in message passing, and processors may communicate without necessarily being aware of one another. Shared memory provides 2 services: a. Direct access to another processor's local memory. b. Automatic address mapping of a (virtual) memory address onto a (processor, local memory address) pair. 1. Convergence of parallel architectures :- While cc-NUMA architectures add specialized support for shared memory, e.g. coherence control, they still rely on fine-grained message passing involving short messages. So do single sided architectures. So it appears that designs are converging, with the important detailed handled through a combination of software and specialized support. 2. The Cache Coherence Problem:- Owing to the use of cache memories in modern computer architectures, shared memory introduces the cache coherence problem. Cache coherence arises with shared data that is to be written and read. If one processor modifies a shared cached value, then the other processor(s) must get the latest value. Coherence says nothing about when changes propagate through the memory sub system, only that they will eventually happen. Other steps must be take (usually in software) to avoid race conditions could lead to non - deterministic program behavior. a. Program order :- If a processor writes and then reads the same location X, and there are no other intervening writes by other processors to X , then the read will always return the value previously written. b. Definition of a coherent view of memory :- If a processor P reads from location X that was previously written by a processor Q, then the read will return the value previously written, if a sufficient amount of time has elapsed between the read and the write. c. Serialization of writes :- Multiple writes to a location X happen sequentially. If two processors write to the same location, then other processors reading X will observe the same the sequence of values in the order written. If a 10 and then a 20 is written to X, then it is not possible for any processor to read 20 then 10.
  • 5. 3. Managing coherence:- There are two major strategies for managing coherence a. Snooping protocol. In this bus-based scheme processors passively listen for bus activity, updating or invalidating cache entries as necessary. The scheme is ultimately non-scalable, and isn't appropriate for machine with tens of processors or more. b. Directory-based. This is a scalable scheme employing point-to-point messages to handle coherence. A memory structure called a directory maintains information about data sharing. This scheme was first applied to cache-coherent multiprocessors by the DASH project at Stanford and is used in the SGI Altix 3000, which is a CC-NUMA architecture. NUMA stands for Non Uniform Memory Access time to memory. Depending on the location of the processor and the address accessed. The 3 goals of DASH are :- • Scalable memory bandwidth • Scalable cost (use commodity parts) • Deal with large memory latencies
  • 6. Memory Architecture block diagram the memory unit has 8-bit address bus, two 8-bit data buses – one for input and the one for output, a clock – input and 1-bit write-enable input. When the write_enable is set HIGH (1), the incoming data (coming through the data_in bus) is first store in the memory address specified by the address bus, and then the newly written data is fetched from the same address and is output through the data_outbus.
  • 7. Circuit diagram In this chip write_enable that is a input . Three more inputs are connectd to the chip. Those are the clk , the clk is HIGH all the time another is 8-bit data input bus, last one is 8-bit address bus. When the data input in the memory , the write_enable is HIGH. Data pass through the chip and the data stored in the address bus. When data is fetched , the output data is store in the data_out bus. That time the clk is high or enable.
  • 8. Memory chip algorithm • This is the algorithm use for the memory chip begin if (write_enable) begin memory[address] <= data_in; end data_out <= memory[address];
  • 9. An Example of the Memory Architecture When the Right Enable is LOW At this time CLk = (high) Data_in = X (don’t care) Address = (A6)H [10100110] Write_enable = 0 Then the Data_out = (8B)H [10001011] Data_out = memory [A6] this is using by this data_out <= memory [address]; this is diagram for the write _enable is low and how it works,
  • 10. When the Right Enable is HIGH At this time CLk = high Data_in = 9FH [10011111] Address = (A6)H [10100110] Write_enable = 1 clk =when High low Then the Mem[A6] = 9F (data_in) Data_out = mem[A6] Data_out = 9F(10011111) this is using by this data_out <= memory [address]; Here when the Right Enable is High the DATA store in A6 address that is changed. The DATA is 9FH that is the present Data in Address A6.
  • 13. References [1] https://books.google.co.in /books NUMA coding for VLSI and Source code. [2] “Introduction to NUMA architecture with shared memory.” https://computing.llnl.gov/tutorials/parallel_comp [3] Intel in NUMA [4]“UMA-NUMA Scalability” www.cs.drexel.edu/~wmm24/cs281/lectures/ppt/cs282_lec12.ppt