SlideShare a Scribd company logo
NUMA
MAATALLA Abed
abedmaatalla@gmail.com
• What is NUMA?
• History of processors.
• Close look on NUMA.
• UMA, NUMA & NUMA SMP architect.
• Barriers of NUMA.
• Solutions.
• Existing simulators.
• Benefits of NUMA
What is NUMA?
• Non-Uniform Memory Access: it will take longer
to access some regions of memory than others
• Designed to improve scalability on large SMPs
• Processor can access its own local memory faster than
non-local memory.
SMP: symmetric multiprocessing
What is NUMA?
• Groups of processors (NUMA node) have their own local
memory
– Any processor can access any memory, including the
one not "owned" by its group (remote memory)
– Non-uniform: accessing local memory is faster than
accessing remote memory
What is NUMA?
• Nodes are linked to each other by a hight-speed interconnection
• NUMA limits the number of CPUs
• Each group of processors has its own memory and possibly its I/O
channels
• The number of CPUs withing a NUMA node depends on the hardware
vendor.
What is NUMA?
• Facts:
– (most of) memory is
allocated at task startup.
– tasks are (usually) free to
run on any processor.
Both local and remote
accesses can happen
during task's life.
History of processors.
• Mental model of CPUs is stuck in the 1980s: basically
boxes that do arithmetic, logic, bit twiddling and shifting,
and loading and storing things in memory. But various
newer developments like vector instructions (SIMD) and
the idea that newer CPUs have support for virtualization.
• Many supercomputer designs of the 1980s and 1990s
focused on providing high-speed memory access as
opposed to faster processors, allowing the computers to
work on large data sets at speeds other systems could
not approach.
History of processors.
• The first commercial implementation of a NUMA-based
Unix system was the Symmetrical Multi Processing XPS-
100 family of servers, designed by Dan Gielan of VAST
Corporation for Honeywell Information Systems Italy.
Close look on NUMA.
• One can view NUMA as a tightly coupled form of cluster
computing. The addition of virtual memory paging to a
cluster architecture can allow the implementation of
NUMA entirely in software. However, the inter-node
latency of software-based NUMA remains several orders
of magnitude greater (slower) than that of hardware-
based NUMA.
• NUMA come to solve performance problems by
providing separate memory for each processor &
avoiding the performance hit when several processors
attempt to address the same memory.
Close look on NUMA
• Threads that share memory should be on the same
socket, and a memory-mapped I/O heavy thread should
make sure it’s on the socket that’s closest to the I/O
device it’s talking to.
• There is multiple level of memory like CC & LLC
because CPU become faster and need to speed up
memory access, it calls memory tree.
Close look on NUMA
• NUMA VS ccNUMA: The difference is almost
nonexistent at this point. ccNUMA stands for Cache-
Coherent NUMA, but NUMA and ccNUMA have really
come to be synonymous. The applications for non-cache
coherent NUMA machines are almost non-existent, and
they are a real pain to program for, so unless specifically
stated otherwise, NUMA actually means ccNUMA.
Close look on NUMA
• When a processor looks for data at a certain memory
address, it first looks in the L1 cache on the
microprocessor itself, then on a somewhat larger L1 and
L2 cache chip nearby, and then on a third level of cache
that the NUMA configuration provides before seeking the
data in the "remote memory" located near the other
microprocessors. Each of these NODES in the
interconnection network. NUMA maintains a hierarchical
view of the data on all the nodes.
• InterConnection Netwrok (ICN): as mentioned above,
ICN related NODES to allow exchange of data between
them. ( same in cluster physical link allow exchange of
data)
UMA, NUMA & NUMA SMP architect
• Uniform memory access(UMA): all
processors have same latency to
access memory. This architecture is
scalable only for limited nmber of
processors.
• Nom Uniform Memory
Access(NUMA): each processor has
its own local memory, the memory of
other processor is accessible but the
lantency to access them is not the
same which this event called " remote
memory access"
UMA, NUMA & NUMA SMP architect
• NUMA SMP: the hardware
trend is to use NUMA systems
with sereval NUMA nodes as
show in figure. A NUMA node
haa a group of processors
having shared memory. A
NUMA node can use its local
bus to interact with local
memory. Multiple NUMA
nodes can be added to form a
SMP. A common SMP bus can
interconnect all NUMA nodes
Barriers of NUMA.
• Spread data between memories.
Barriers of NUMA.
• Spread tacks between sockets.
Barriers of NUMA.
• IO NUMA: needs to be considered during placement /
scheduling.
Barriers of NUMA.
• There was just memory in 80s. Then CPUs got fast
enough relative to memory that people wanted to add a
cache. It’s bad news if the cache is inconsistent with the
backing store (memory), so the cache has to keep some
information about what it’s holding on to so it knows
if/when it needs to write things to the backing store.
Barriers of NUMA.
• Data request by more
than one processor.
• How far apart the
processors are from their
associated memory
banks.
Solutions
• It exist some hardware implementation to solve some
problems. Because, buying a high end server is so
expensive to test on it new approches and need a
special condition like cold and space.
• We as developer could create a simulator to implement
different approaches to analyse, improve performance
and scalability. This mean that simulator need to handle
software and hardware part also, by indicating remote
memory access events, calculate execution time of each
process and IO events ... etc.
Existing simulators
There is a same number of existing project that could be
named such as: RSIM, SICOSYS, SIMT and simNUMA.
Those projects exist and have done pretty nice job each
of those has power points and weakness points, but it's
already started and there is much more to cover and to
implement in this field.
There are a lot of approches and theories that needs to
be tested and proved or disproved.
For those reason mentioned above simulator plays an
important role in the near future
Benefit of NUMA
As mentioned above and scalability. It is extremely
difficult to scale SMP CPUs. At that number of CPUs, the
memory bus is under heavy contention. NUMA is one
way of reducing the number of CPUs competing for
access to a shared memory bus. This is accomplished
by having several memory busses and only having a
small number of CPUs on each of those busses.
I’m interested in things that
CPUs can’t do yet but will be
able to do in the near future.
Thank you

More Related Content

What's hot

Real-Time Scheduling
Real-Time SchedulingReal-Time Scheduling
Real-Time Scheduling
sathish sak
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating system
udaya khanal
 
Virtualization Basics
Virtualization BasicsVirtualization Basics
Virtualization Basics
SrikantMishra12
 
Real time-embedded-system-lec-02
Real time-embedded-system-lec-02Real time-embedded-system-lec-02
Real time-embedded-system-lec-02
University of Computer Science and Technology
 
Foult Tolerence In Distributed System
Foult Tolerence In Distributed SystemFoult Tolerence In Distributed System
Foult Tolerence In Distributed System
Rajan Kumar
 
Os Threads
Os ThreadsOs Threads
Os Threads
Salman Memon
 
presentation on real time operating system(RTOS's)
presentation on real time operating system(RTOS's)presentation on real time operating system(RTOS's)
presentation on real time operating system(RTOS's)
chetan mudenoor
 
Real Time Operating System
Real Time Operating SystemReal Time Operating System
Real Time Operating System
Himanshu Choudhary
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
Haris456
 
Sequential consistency model
Sequential consistency modelSequential consistency model
Sequential consistency model
Bharathi Lakshmi Pon
 
Distributed operating system(os)
Distributed operating system(os)Distributed operating system(os)
Distributed operating system(os)
Dinesh Modak
 
RTOS - Real Time Operating Systems
RTOS - Real Time Operating SystemsRTOS - Real Time Operating Systems
RTOS - Real Time Operating Systems
Emertxe Information Technologies Pvt Ltd
 
Rtos concepts
Rtos conceptsRtos concepts
Rtos concepts
anishgoel
 
Process scheduling algorithms
Process scheduling algorithmsProcess scheduling algorithms
Process scheduling algorithms
Shubham Sharma
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
Ashish Kumar
 
Multiprocessor Scheduling
Multiprocessor SchedulingMultiprocessor Scheduling
Multiprocessor Scheduling
Khadija Saleem
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
Kathirvel Ayyaswamy
 
High–Performance Computing
High–Performance ComputingHigh–Performance Computing
High–Performance Computing
BRAC University Computer Club
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systems
sumitjain2013
 
Trends in distributed systems
Trends in distributed systemsTrends in distributed systems
Trends in distributed systems
Jayanthi Radhakrishnan
 

What's hot (20)

Real-Time Scheduling
Real-Time SchedulingReal-Time Scheduling
Real-Time Scheduling
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating system
 
Virtualization Basics
Virtualization BasicsVirtualization Basics
Virtualization Basics
 
Real time-embedded-system-lec-02
Real time-embedded-system-lec-02Real time-embedded-system-lec-02
Real time-embedded-system-lec-02
 
Foult Tolerence In Distributed System
Foult Tolerence In Distributed SystemFoult Tolerence In Distributed System
Foult Tolerence In Distributed System
 
Os Threads
Os ThreadsOs Threads
Os Threads
 
presentation on real time operating system(RTOS's)
presentation on real time operating system(RTOS's)presentation on real time operating system(RTOS's)
presentation on real time operating system(RTOS's)
 
Real Time Operating System
Real Time Operating SystemReal Time Operating System
Real Time Operating System
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
Sequential consistency model
Sequential consistency modelSequential consistency model
Sequential consistency model
 
Distributed operating system(os)
Distributed operating system(os)Distributed operating system(os)
Distributed operating system(os)
 
RTOS - Real Time Operating Systems
RTOS - Real Time Operating SystemsRTOS - Real Time Operating Systems
RTOS - Real Time Operating Systems
 
Rtos concepts
Rtos conceptsRtos concepts
Rtos concepts
 
Process scheduling algorithms
Process scheduling algorithmsProcess scheduling algorithms
Process scheduling algorithms
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 
Multiprocessor Scheduling
Multiprocessor SchedulingMultiprocessor Scheduling
Multiprocessor Scheduling
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
High–Performance Computing
High–Performance ComputingHigh–Performance Computing
High–Performance Computing
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systems
 
Trends in distributed systems
Trends in distributed systemsTrends in distributed systems
Trends in distributed systems
 

Similar to Overview on NUMA

Numa (non uniform memory access)
Numa (non uniform memory access)Numa (non uniform memory access)
Numa (non uniform memory access)
Mamesh
 
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...
langonej
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
Niranjana Ambadi
 
Lecture4
Lecture4Lecture4
Lecture4
tt_aljobory
 
Parallel processing
Parallel processingParallel processing
Parallel processing
Shivalik college of engineering
 
22CS201 COA
22CS201 COA22CS201 COA
22CS201 COA
Kathirvel Ayyaswamy
 
CA UNIT IV.pptx
CA UNIT IV.pptxCA UNIT IV.pptx
CA UNIT IV.pptx
ssuser9dbd7e
 
Week 13-14 Parrallel Processing-new.pptx
Week 13-14 Parrallel Processing-new.pptxWeek 13-14 Parrallel Processing-new.pptx
Week 13-14 Parrallel Processing-new.pptx
FaizanSaleem81
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word document
sandya veduri
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processing
Syed Zaid Irshad
 
Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)
Nakul Manchanda
 
CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptx
AbcvDef
 
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
sprdd
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer Architecture
Subhasis Dash
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.ppt
AberaZeleke1
 
High performance computing
High performance computingHigh performance computing
High performance computing
punjab engineering college, chandigarh
 
Massively Parallel Architectures
Massively Parallel ArchitecturesMassively Parallel Architectures
Massively Parallel Architectures
Jason Hearne-McGuiness
 
OS_MD_4.pdf
OS_MD_4.pdfOS_MD_4.pdf
OS_MD_4.pdf
SangeethaBS4
 
Symmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelSymmetric multiprocessing and Microkernel
Symmetric multiprocessing and Microkernel
Manoraj Pannerselum
 
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Docker, Inc.
 

Similar to Overview on NUMA (20)

Numa (non uniform memory access)
Numa (non uniform memory access)Numa (non uniform memory access)
Numa (non uniform memory access)
 
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
 
Lecture4
Lecture4Lecture4
Lecture4
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
22CS201 COA
22CS201 COA22CS201 COA
22CS201 COA
 
CA UNIT IV.pptx
CA UNIT IV.pptxCA UNIT IV.pptx
CA UNIT IV.pptx
 
Week 13-14 Parrallel Processing-new.pptx
Week 13-14 Parrallel Processing-new.pptxWeek 13-14 Parrallel Processing-new.pptx
Week 13-14 Parrallel Processing-new.pptx
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word document
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processing
 
Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)
 
CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptx
 
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer Architecture
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.ppt
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Massively Parallel Architectures
Massively Parallel ArchitecturesMassively Parallel Architectures
Massively Parallel Architectures
 
OS_MD_4.pdf
OS_MD_4.pdfOS_MD_4.pdf
OS_MD_4.pdf
 
Symmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelSymmetric multiprocessing and Microkernel
Symmetric multiprocessing and Microkernel
 
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
 

Recently uploaded

Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
wisnuprabawa3
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 

Recently uploaded (20)

Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 

Overview on NUMA

  • 2. • What is NUMA? • History of processors. • Close look on NUMA. • UMA, NUMA & NUMA SMP architect. • Barriers of NUMA. • Solutions. • Existing simulators. • Benefits of NUMA
  • 3. What is NUMA? • Non-Uniform Memory Access: it will take longer to access some regions of memory than others • Designed to improve scalability on large SMPs • Processor can access its own local memory faster than non-local memory. SMP: symmetric multiprocessing
  • 4. What is NUMA? • Groups of processors (NUMA node) have their own local memory – Any processor can access any memory, including the one not "owned" by its group (remote memory) – Non-uniform: accessing local memory is faster than accessing remote memory
  • 5. What is NUMA? • Nodes are linked to each other by a hight-speed interconnection • NUMA limits the number of CPUs • Each group of processors has its own memory and possibly its I/O channels • The number of CPUs withing a NUMA node depends on the hardware vendor.
  • 6. What is NUMA? • Facts: – (most of) memory is allocated at task startup. – tasks are (usually) free to run on any processor. Both local and remote accesses can happen during task's life.
  • 7. History of processors. • Mental model of CPUs is stuck in the 1980s: basically boxes that do arithmetic, logic, bit twiddling and shifting, and loading and storing things in memory. But various newer developments like vector instructions (SIMD) and the idea that newer CPUs have support for virtualization. • Many supercomputer designs of the 1980s and 1990s focused on providing high-speed memory access as opposed to faster processors, allowing the computers to work on large data sets at speeds other systems could not approach.
  • 8. History of processors. • The first commercial implementation of a NUMA-based Unix system was the Symmetrical Multi Processing XPS- 100 family of servers, designed by Dan Gielan of VAST Corporation for Honeywell Information Systems Italy.
  • 9. Close look on NUMA. • One can view NUMA as a tightly coupled form of cluster computing. The addition of virtual memory paging to a cluster architecture can allow the implementation of NUMA entirely in software. However, the inter-node latency of software-based NUMA remains several orders of magnitude greater (slower) than that of hardware- based NUMA. • NUMA come to solve performance problems by providing separate memory for each processor & avoiding the performance hit when several processors attempt to address the same memory.
  • 10. Close look on NUMA • Threads that share memory should be on the same socket, and a memory-mapped I/O heavy thread should make sure it’s on the socket that’s closest to the I/O device it’s talking to. • There is multiple level of memory like CC & LLC because CPU become faster and need to speed up memory access, it calls memory tree.
  • 11. Close look on NUMA • NUMA VS ccNUMA: The difference is almost nonexistent at this point. ccNUMA stands for Cache- Coherent NUMA, but NUMA and ccNUMA have really come to be synonymous. The applications for non-cache coherent NUMA machines are almost non-existent, and they are a real pain to program for, so unless specifically stated otherwise, NUMA actually means ccNUMA.
  • 12. Close look on NUMA • When a processor looks for data at a certain memory address, it first looks in the L1 cache on the microprocessor itself, then on a somewhat larger L1 and L2 cache chip nearby, and then on a third level of cache that the NUMA configuration provides before seeking the data in the "remote memory" located near the other microprocessors. Each of these NODES in the interconnection network. NUMA maintains a hierarchical view of the data on all the nodes. • InterConnection Netwrok (ICN): as mentioned above, ICN related NODES to allow exchange of data between them. ( same in cluster physical link allow exchange of data)
  • 13. UMA, NUMA & NUMA SMP architect • Uniform memory access(UMA): all processors have same latency to access memory. This architecture is scalable only for limited nmber of processors. • Nom Uniform Memory Access(NUMA): each processor has its own local memory, the memory of other processor is accessible but the lantency to access them is not the same which this event called " remote memory access"
  • 14. UMA, NUMA & NUMA SMP architect • NUMA SMP: the hardware trend is to use NUMA systems with sereval NUMA nodes as show in figure. A NUMA node haa a group of processors having shared memory. A NUMA node can use its local bus to interact with local memory. Multiple NUMA nodes can be added to form a SMP. A common SMP bus can interconnect all NUMA nodes
  • 15. Barriers of NUMA. • Spread data between memories.
  • 16. Barriers of NUMA. • Spread tacks between sockets.
  • 17. Barriers of NUMA. • IO NUMA: needs to be considered during placement / scheduling.
  • 18. Barriers of NUMA. • There was just memory in 80s. Then CPUs got fast enough relative to memory that people wanted to add a cache. It’s bad news if the cache is inconsistent with the backing store (memory), so the cache has to keep some information about what it’s holding on to so it knows if/when it needs to write things to the backing store.
  • 19. Barriers of NUMA. • Data request by more than one processor. • How far apart the processors are from their associated memory banks.
  • 20. Solutions • It exist some hardware implementation to solve some problems. Because, buying a high end server is so expensive to test on it new approches and need a special condition like cold and space. • We as developer could create a simulator to implement different approaches to analyse, improve performance and scalability. This mean that simulator need to handle software and hardware part also, by indicating remote memory access events, calculate execution time of each process and IO events ... etc.
  • 21. Existing simulators There is a same number of existing project that could be named such as: RSIM, SICOSYS, SIMT and simNUMA. Those projects exist and have done pretty nice job each of those has power points and weakness points, but it's already started and there is much more to cover and to implement in this field. There are a lot of approches and theories that needs to be tested and proved or disproved. For those reason mentioned above simulator plays an important role in the near future
  • 22. Benefit of NUMA As mentioned above and scalability. It is extremely difficult to scale SMP CPUs. At that number of CPUs, the memory bus is under heavy contention. NUMA is one way of reducing the number of CPUs competing for access to a shared memory bus. This is accomplished by having several memory busses and only having a small number of CPUs on each of those busses.
  • 23. I’m interested in things that CPUs can’t do yet but will be able to do in the near future.