SlideShare a Scribd company logo
1 of 53
Download to read offline
Memory Management
From Silicon to Algorithm
Sysadmin #7
Adrien Mahieux - Sysadmin & microsecond hunter
gh: github.com/Saruspete
tw: @Saruspete
em: adrien.mahieux@gmail.com
1) HW : Data Storage
2) HW : Data access
3) HW : Data processing
4) SW : Linux Internals
5) SW : Application allocator
6) SEC : Attacks
Agenda
{S,D,V}RAM, Bank, Rank, {u,r,lr,fb}dimm
CAS Timing, ECC
{S,D,Q}DR, NUMA, Channels, CPU (Cache,
Associativity
CPU Pipeline, Branch prediction, MMU
Zones, Buddy allocator, fragmentation, sl[auo]b
Stack / heap, regions, memory allocator..
Malloc implementation, ptmalloc2 details
Where are we ?
How long ?
How often ?
What ratio ?
What path ?
Hardware : Data Storage
HW : Data Storage - Hierarchy
DRAM Chip
x4 = 16 Banks
x8 = 8 Banks
x16 = 4 Banks
Bank
Array
Row Decoder
Row Buffer
Column Decoder
Array Cell
1 bit
HW : Data Storage - {S,D,V}RAM
RAM Random Access Memory
SRAM Static RAM
DRAM Dynamic RAM
VRAM Video RAM
SRAM DRAM
SRAM DRAM
Speed CMOS 2T Cond.
Power Consumption Constant Low + burst
Production Cost Expensive Cheap
Production complexity 5 trans. 1 trans.
Read Operation Stable Destructive
HW : Data Storage - DRAM Refresh
DRAM must be refreshed, even if not accessed.
Done every 4 - 64ms
Refresh can be done by :
- Burst refresh : Stop all operation, refresh all memory
- Distributed refresh : refresh one (or more) row at a time
- Self-Refresh (low-power mode) : Turn off memory controller, and refresh
capacitor itself
HW : Data Storage - Bank
- Row Buffer holds read data
- Read gets entire row into the
buffer
- Once read, capacitor is
empty, but value in buffer
- Write bits back before doing
another read
Process is called “opening” or
“closing” a row
0 1 0 1 0 0 1 0 0 0 1 1 1 0 1 0
0 1 0 1 0 0 1 0 0 0 1 1 1 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
HW : Data Storage - Rank
Rank : Set of DRAM Modules on a DIMM connected to the same
Chip-Select pin (accessed simultaneously)
A Rank has 64 bit wide data bus (72 on DIMM with ECC) and noted
<rank>Rx<dram-width>:
- 1Rx16 : Single Rank, 16bits width (4 DRAM to have 64bits)
- 2Rx8 : Dual Rank, 8bits width (8 DRAM to have 64bits)
- 4Rx8 : Quad Rank, 8bits width (8 DRAM to have 64bits)
DIMM Dual Inline Memory Module ( != SIMM)
SO-DIMM Small Outline DIMM (for laptop & embedded)
UDIMM Unregistered DIMM (standard end-user)
RDIMM Registered DIMM or Buffered DIMM (most servers).
FB-DIMM Fully Buffered DIMM : Buffer for Addr & Data bus.
LR-DIMM Load-Reduced DIMM : Like FB, but only serialize data, not address
ECC Error Correcting Code : Parity value checking (like RAID5)
HW : Data Storage - {U,R,LR,FB}DIMM
HW : Data Storage - {U,R,LR,FB}DIMM
Latency Max Size
(1 DIMM)
@ Bus Data Bus Implementation Details
UDIMM Low 8 GB Parallel Parallel Input command and output data bus are
directly connected to the bus.
RDIMM UDIMM + 1
cycle
32 GB Parallel Parallel Same as UDIMM, but input commands are
stabilized through a register (cost of 1 cycle)
FB-DIMM 8 GB Serial Serial Add a big buffer for both command and data.
But serial implementation generates
hyperfrequency and signal stability issues.
LR-DIMM Similiar to
RDIMM
128 GB Parallel Parallel Fix issues of FBDIMM. Based on RDIMM and
also buffers data-lines.
HW : Data Storage - RAM Timings
Row and Column addresses are sent
on the same (address) bus.
Multiplexer on the memory DIMM
Notation : w-x-y-z-T
- w : CAS Latency (CL)
- x : RAS to CAS delay (TRCD)
- y : RAS precharge (TRP)
- z : Active to Precharge delay (TRAS)
- T : Command Rate
Timings are in cycles
CL : Column select → Data avail. on bus
TRCD : Row select → Column select
TRP : New line activation (opening)
TRAS : Line deactivation (closing)
T : Between 2 commands
TRRD RAS to RAS. Time to activate the next bank of memory.
TWTR Write to Read. Between write command and the next read command.
TWR Write Recovery. Time after a valid write operation and precharge.
TRFC Row Refresh Cycle. to refresh a row on a memory bank.
TRTP Read To Precharge. between a read command to a row pre-charge
command to the same rank.
TRTW Read to Write Delay. When a write command is received, cycles for the
command to be executed.
TRC Row Cycle. The minimum time in cycles it takes a row to complete a full
cycle. This can be determined by; tRC = tRAS + tRP.
HW : Data Storage - RAM Timings (advanced)
TREF before a charge is refreshed so it does not lose its charge and corrupt.
TWCL Write CAS number. Write to whatever bank is open to be written too.
CPC Command Per Clock. chip select is executed then commands are
issued.TRD Static tREAD.
HW : Data Storage - RAM Timings (advanced)
HW : Data Storage - DIMM Assembly
x8 ⇒ Each DRAM outputs 8 bits
HW : Data Storage - 3D X-Point
Public name : Optane
- Low Latency (< 10us)
- High Density
- No memory controller
- Voltage Variation
Hardware : Data Access
HW : Data Access - {S,D,Q}DR
The quantity of usable data handled by the memory for every Clock Cycle.
Introduced the concept of “Tps” : Transfers per second
Original DRAM : specify RAS + CAS for every operation.
FPM (1990) : multiple reads from the same row without RAS.
EDO (1995): allows to select next column while reading old one
SDR (1997) : single selection then burst following
DDR (2000) : transfers data on both rising and falling edge of the clock
DDR2 (2003) : 2 internal channels
DDR3 (2007) : Doubled transfer speed
DDR4 (2013) : Increased frequency
DDR5 (2020) : JEDEC released specs…
HW : Data Access - UMA / NUMA
Uniform Memory Access
Central Northbridge
Non Uniform Memory Access
One MMU for each Socket
HW : Data Access - Memory Channels
DIMMs used in parallel to increase the
bandwidth : single / dual / triple channel
Channels must be balanced
HW : Data Access - Direct Memory Access
Bypass CPU processing
- PCI-E
- Thunderbolt
- Firewire
- Cardbus
- Expresscard
DMA controller
advertise caches about
RAM changes
Direct Cache Access
Hardware : Data Processing
HW : Data Processing - CPU Pipeline
HW : Data Processing - Cache
DRAM is slow related to CPU cycles
⇒ Let’s use cache
Can be used for read (prefetch) Write
(write-back)
Eviction done by tracking algorithm
- LRU Least Recently Used
- LFU Least Frequently Used
- FIFO First In First Out
- ARC Adaptive Replacement Cache
Hit-Ratio gives usefulness of cache
HW : Data Processing - Cache
Distribution policy :
- Fully Associative : All blocks checked simultaneously (heavy hardware)
- Direct Mapped : fast but need balanced spread (rare)
- Set-Associative : mix of 2 previous
Address can be :
- Virtual : Fast access but not unique. Used by L1 and TLB
- Physical : Calculation needed but unique. Used for other caches
Programmers : avoid mapping same @Phys on multiple @Virt
HW : Data Processing - Cache Coherency
Multiple CPUs ⇒ Multiple caches ⇒
SYNC
MOESI (Modified Owned Exclusive
Shared Invalid) on NUMA systems
Processors use cache snooping
Request For Ownership ⇒ Very costly
When a CPU changes a data already in
cache of another CPU
HW : Data Processing - Writing
Write policy on memory zones done by MTRR (Memory Type Range Register)
Write Through : All data written in cache is also written in memory
Write Back : Delay memory access as long as possible
Write Combining : Force writes to be grouped in bulk
Uncacheable : For some I/O and HW, like BIOS, ACPI, IOAPIC...
HW : Data Processing - Memory Management Unit
Switch Between @Virtual and @Physical ⇒ Translation done by CPU (MMU)
Not directly mapped: To @ 16EB of 64bits, direct array would be huge !
We only use 48bits (up to 256TB) and Page Tables to avoid management waste.
4 cascading tables: Page {Global,Upper,Middle} Directory and Page Table
HW : Data Processing - Memory Management Unit
Page Walking :
1) @Base for L4
2) Add offset from bits 39-47
⇒ Got @Base for L3
3) Add offset from bits 30-38
⇒ Got @Base for L2
4) Add offset from bits 21-29
⇒ Got @Base for L1
5) Add offset from bits 12-20
⇒ Got @Base for Page
6) Add offset from bits 0-11
⇒ @Physical
HW : Data Processing - Memory Management Unit
An empty Page Directory stores 512 (29
) entries ⇒ 64bits * 512 = 4KB
For 32KB (4 * 512) ⇒ @ 2MB
For 2MB (3*512 + 512*512) ⇒ @ 1GB
For 128MB (2*512 + 5123
) ⇒ @ 550GB
⇒ Low overhead for storage... but requires 4 reads.
⇒ let’s cache these translation : Translation Lookaside Buffer (TLB)
Limit TLB flush upon context switching by adding the page-table ID to the TLB entry
Software : Linux Internals
Name Size (x86) Size (x86_64) Description
DMA < 16MB < 16MB For very old devices (@24 bits)
DMA32 N/A 16 - 4096MB For devices addressing up to 32bits (4GB)
NORMAL 16 - 896MB > 4096MB Memory directly mapped by Kernel
HIGHMEM > 896MB N/A
SW : Linux Internals - Zones
SW : Linux Internals - Zones
32bits : 3/1 split (or 2/2 or 1/3) between Userspace & Kernel
On these 1GB of kernel space, 128MB used to map higher pages. 1024 - 128 = 896
Low memory : directly addressable by Kernel
High memory : must use the 128MB indirection table to be addressed
64bit : all space directly addressable by MMU
SW : Linux Internals - Zones
Jul 12 22:13:12 [server] kernel: swapper: page allocation failure. order:2, mode:0x4020
Jul 12 22:46:46 [server] kernel: [app_name]: page allocation failure. order:4, mode:0xd0
include/linux/gfp.h : Zone usage per flag
* bit result
* =================
* 0x0 => NORMAL
* 0x1 => DMA or NORMAL
* 0x2 => HIGHMEM or NORMAL
* 0x3 => BAD (DMA+HIGHMEM)
* 0x4 => DMA32 or DMA or NORMAL
* 0x5 => BAD (DMA+DMA32)
* 0x6 => BAD (HIGHMEM+DMA32)
* 0x7 => BAD (HIGHMEM+DMA32+DMA)
* 0x8 => NORMAL (MOVABLE+0)
* 0x9 => DMA or NORMAL (MOVABLE+DMA)
* 0xa => MOVABLE (Movable is valid only if HIGHMEM is set too)
* 0xb => BAD (MOVABLE+HIGHMEM+DMA)
* 0xc => DMA32 (MOVABLE+DMA32)
* 0xd => BAD (MOVABLE+DMA32+DMA)
* 0xe => BAD (MOVABLE+DMA32+HIGHMEM)
* 0xf => BAD (MOVABLE+DMA32+HIGHMEM+DMA)
SW : Linux Internals - Buddy Allocator
Goal : limit External Fragmentation and Internal Fragmentation
SW : Linux Internals - Buddy Allocator
4K pages are grouped by 29
(2MB) or 210
(4MB) blocks
Blocks are then cut in half (2 buddies) to service the request
Upon release, tries to merge buddy pages back together
SW : Linux Internals - Buddy Allocator
Unmovable : Locked in memory
Reclaimable : Reusable after clean
Movable : Immediately available
Reserve : Last resort reserve
Isolate : keep on local NUMA node
CMA : Contiguous Memory
Allocator, for DMA devices with
large contiguous
cat /proc/pagetypeinfo
Page block order: 10
Pages per block: 1024
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8
9 10
Node 0, zone DMA, type Unmovable 1 1 0 0 2 1 1 0 1
1 0
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0
0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0
0 2
Node 0, zone DMA, type Reserve 0 0 0 0 0 0 0 0 0
0 1
Node 0, zone DMA32, type Unmovable 8 6 0 0 0 0 0 0 0
0 0
Node 0, zone DMA32, type Reclaimable 376 2817 5 0 0 0 0 0 0
0 0
Node 0, zone DMA32, type Movable 6323 12025 287 0 0 0 0 0 0
0 0
Node 0, zone DMA32, type Reserve 0 0 1 4 6 2 0 0 1
1 0
Node 0, zone Normal, type Unmovable 2611 137 0 0 0 0 0 0 0
0 0
Node 0, zone Normal, type Reclaimable 33847 4321 144 0 0 0 0 0 0
0 0
Node 0, zone Normal, type Movable 37312 9849 1097 0 0 0 0 0 0
0 0
Node 0, zone Normal, type Reserve 0 0 5 1 0 1 1 2 1
0 1
Number of blocks type Unmovable Reclaimable Movable Reserve
Node 0, zone DMA 1 0 2 1
Node 0, zone DMA32 13 18 796 1
When no space is available, buddy-allocator will call kswapd
SW : Linux Internals - SLAB
SLAB = Allocator for Kernel Objects
Uses cache to avoid fragmentation
Each kernel object is stored in a SLAB
SLAB 1 Queue / NUMA Node
SLUB 1 Queue / CPU
SLOB As compact as possible
Most servers use SLUB :
- Defrag
- Debug
SW : Linux Internals - SLAB
# grep -v '# name' /proc/slabinfo | tr -d '#' | column -t
slabinfo - version: 2.1
[ . . . ]
ext4_inode_cache 294150 294210 1072 30 8 : tunables 0 0 0 : slabdata 9807 9807 0
ext4_allocation_context 224 224 128 32 1 : tunables 0 0 0 : slabdata 7 7 0
ext4_io_end 1472 1472 64 64 1 : tunables 0 0 0 : slabdata 23 23 0
ext4_extent_status 143365 221442 40 102 1 : tunables 0 0 0 : slabdata 2171 2171 0
[ . . . ]
inode_cache 13493 15596 584 28 4 : tunables 0 0 0 : slabdata 557 557 0
dentry 449376 450471 192 21 1 : tunables 0 0 0 : slabdata 21451 21451 0
buffer_head 650327 840294 104 39 1 : tunables 0 0 0 : slabdata 21546 21546 0
task_struct 1299 1336 7936 4 8 : tunables 0 0 0 : slabdata 334 334 0
cred_jar 88986 89271 192 21 1 : tunables 0 0 0 : slabdata 4251 4251 0
[ . . . ]
task_struct (large object)
Object size 7936 B
4 Objects / slab 31744 B
8 Pages / slab 32768 B
Loss : 32768 - 31744 = 1024 B / slab
334 slabs 334 KB lost / 10688 KB (3.125%)
ext4_extent_status (small and compact)
Object size 40 B
102 obj / slab 4080 B
1 Page / slab 4096 B
Loss : 4096 - 4080 = 16B /slab (0.39%)
2171 slabs 334 KB lost / 8892 KB ()
cat /proc/meminfo
MemTotal: 16328616 kB
MemFree: 4021720 kB
MemAvailable: 6653544 kB
Buffers: 380220 kB
Cached: 4688968 kB
SwapCached: 0 kB
Active: 7703764 kB
Inactive: 3890964 kB
Active(anon): 6546524 kB
Inactive(anon): 2354396 kB
Active(file): 1157240 kB
Inactive(file): 1536568 kB
Unevictable: 20988 kB
Mlocked: 20988 kB
SwapTotal: 8191996 kB
SwapFree: 8191996 kB
Dirty: 92 kB
Writeback: 0 kB
AnonPages: 6546580 kB
Mapped: 1763640 kB
Shmem: 2368152 kB
Slab: 383576 kB
SReclaimable: 280636 kB
SUnreclaim: 102940 kB
KernelStack: 19632 kB
PageTables: 115868 kB
[ ... ]
SW : Linux Internals - /proc
SW : Linux Internals - /proc
pmap -X $PID # read /proc/$PID/smaps
21797: /bin/bash
Address Perm Offset Device Inode Size Rss Pss Referenced Anonymous ShmemPmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss Locked Mapping
5576c6639000 r-xp 00000000 fd:04 268557 992 912 80 912 0 0 0 0 0 0 0 bash
5576c6930000 r--p 000f7000 fd:04 268557 16 16 16 16 16 0 0 0 0 0 0 bash
5576c6934000 rw-p 000fb000 fd:04 268557 36 36 36 36 36 0 0 0 0 0 0 bash
5576c693d000 rw-p 00000000 00:00 0 20 20 20 20 20 0 0 0 0 0 0
5576c806d000 rw-p 00000000 00:00 0 1508 1472 1472 1472 1472 0 0 0 0 0 0 [heap]
7f740b9ea000 r-xp 00000000 fd:04 267260 44 44 0 44 0 0 0 0 0 0 0 libnss_files-2.22.so
7f740b9f5000 ---p 0000b000 fd:04 267260 2044 0 0 0 0 0 0 0 0 0 0 libnss_files-2.22.so
7f740bbf4000 r--p 0000a000 fd:04 267260 4 4 4 4 4 0 0 0 0 0 0 libnss_files-2.22.so
7f740bbf5000 rw-p 0000b000 fd:04 267260 4 4 4 4 4 0 0 0 0 0 0 libnss_files-2.22.so
7f740bbf6000 rw-p 00000000 00:00 0 24 0 0 0 0 0 0 0 0 0 0
7f740bbfc000 r--p 00000000 fd:04 267226 109328 504 19 504 0 0 0 0 0 0 0 locale-archive
7f74126c0000 r-xp 00000000 fd:04 267234 1756 1536 10 1536 0 0 0 0 0 0 0 libc-2.22.so
7f7412877000 ---p 001b7000 fd:04 267234 2048 0 0 0 0 0 0 0 0 0 0 libc-2.22.so
7f7412a77000 r--p 001b7000 fd:04 267234 16 16 16 16 16 0 0 0 0 0 0 libc-2.22.so
7f7412a7b000 rw-p 001bb000 fd:04 267234 8 8 8 8 8 0 0 0 0 0 0 libc-2.22.so
7f7412a7d000 rw-p 00000000 00:00 0 16 12 12 12 12 0 0 0 0 0 0
7f7412a81000 r-xp 00000000 fd:04 267240 12 12 0 12 0 0 0 0 0 0 0 libdl-2.22.so
7f7412a84000 ---p 00003000 fd:04 267240 2044 0 0 0 0 0 0 0 0 0 0 libdl-2.22.so
7f7412c83000 r--p 00002000 fd:04 267240 4 4 4 4 4 0 0 0 0 0 0 libdl-2.22.so
7f7412c84000 rw-p 00003000 fd:04 267240 4 4 4 4 4 0 0 0 0 0 0 libdl-2.22.so
7f7412c85000 r-xp 00000000 fd:04 270666 152 152 12 152 0 0 0 0 0 0 0 libtinfo.so.5.9
7f7412cab000 ---p 00026000 fd:04 270666 2044 0 0 0 0 0 0 0 0 0 0 libtinfo.so.5.9
7f7412eaa000 r--p 00025000 fd:04 270666 16 16 16 16 16 0 0 0 0 0 0 libtinfo.so.5.9
7f7412eae000 rw-p 00029000 fd:04 270666 4 4 4 4 4 0 0 0 0 0 0 libtinfo.so.5.9
7f7412eaf000 r-xp 00000000 fd:04 267194 132 132 0 132 0 0 0 0 0 0 0 ld-2.22.so
7f74130a3000 rw-p 00000000 00:00 0 20 20 20 20 20 0 0 0 0 0 0
7f74130a8000 r--p 00000000 fd:04 1442509 124 64 9 64 0 0 0 0 0 0 0 bash.mo
7f74130c7000 r--s 00000000 fd:04 527264 28 28 0 28 0 0 0 0 0 0 0 gconv-modules.cache
7f74130ce000 rw-p 00000000 00:00 0 4 4 4 4 4 0 0 0 0 0 0
7f74130cf000 r--p 00020000 fd:04 267194 4 4 4 4 4 0 0 0 0 0 0 ld-2.22.so
7f74130d0000 rw-p 00021000 fd:04 267194 4 4 4 4 4 0 0 0 0 0 0 ld-2.22.so
7f74130d1000 rw-p 00000000 00:00 0 4 4 4 4 4 0 0 0 0 0 0
7ffc4fbbe000 rw-p 00000000 00:00 0 136 32 32 32 32 0 0 0 0 0 0 [stack]
7ffc4fbe8000 r--p 00000000 00:00 0 8 0 0 0 0 0 0 0 0 0 0 [vvar]
7ffc4fbea000 r-xp 00000000 00:00 0 8 4 0 4 0 0 0 0 0 0 0 [vdso]
ffffffffff600000 r-xp 00000000 00:00 0 4 0 0 0 0 0 0 0 0 0 0 [vsyscall]
====== ==== ==== ========== ========= ============== ============== =============== ==== ======= ======
122620 5072 1814 5072 1684 0 0 0 0 0 0 KB
Software : Application Memory Allocator
SW : Application Memory Allocator
SW : Application Memory Allocator
SW : Application Memory Allocator
malloc NOT A SYSCALL (mmap + brk)
Speed & minimal fragmentation
Multiple implementations :
- dlmalloc Doug-Lea (Generic)
- ptmalloc2 Current glibc
- jemalloc Jason Evans (FreeBSD, Firefox, FB)
- tcmalloc Thread-Caching (Google)
- libumem Solaris
- €€€€€ lockless, hoard, smartheap...
SW : Application Memory Allocator - ptmalloc2
Uses brk or mmap to do allocation
- brk / sbrk for main thread and if req < 128KB
- mmap otherwise.
Maintains arenas : main & thread
Each arena is composed of shards
Upon free() ptmalloc adds freed region to a “bin” to
be reused for later allocations:
- Fast 16 - 80 bytes
- Unsorted No size limit. Latest freed.
- Small < 512 bytes
- Large >= 512 bytes
SW : Application Memory Allocator - ptmalloc2
Internal Structures :
- malloc_state : Arena Header. Has multiple heaps,
except for Main Arena (which just grows its heap)
- heap_info : Heap Header. Has multiple chunks
- malloc_chunk : Chunk Header. Result of malloc()
Security : Attacks
Security : Some Attacks
Rowhammer Bitflip on DRAM Rows
Evil Maid DMA using physical access
Stack Clash Huge stack usage to overlap on heap
Bibliography
Bibliography - Hardware
https://compas.cs.stonybrook.edu/~nhonarmand/courses/sp15/cse502/slides/06-main_mem.pdf
http://www.eng.utah.edu/~cs7810/pres/11-7810-12.pdf
http://slideplayer.fr/slide/3279682/
https://forums.tweaktown.com/gigabyte/27283-memory-timings-explained-suggested-timings-memset-vs-bios.html
http://www.masterslair.com/memory-ram-timings-latency-cas-ras-tcl-trcd-trp-tras
https://www.slideshare.net/abhilash128/lec-21-16642228
https://en.wikichip.org
http://www.overclockingmadeinfrance.com/quest-ce-que-les-cas-latency/
https://stackoverflow.com/questions/29522431/does-a-branch-misprediction-flush-the-entire-pipeline-even-for-very-short-if-st
https://kshitizdange.github.io/418CacheSim/final-report
http://www.lifl.fr/~marquet/cnl/ssam/ssam-c3.pdf
http://www.toves.org/books/cache/
http://www.simmtester.com/page/news/showpubnews.asp?num=168
http://wiki.osdev.org/X86-64
Bibliography - Software
https://sploitfun.wordpress.com/2015/02/10/understanding-glibc-malloc/
http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory/
http://duartes.org/gustavo/blog/post/memory-translation-and-segmentation/
http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory/
http://www.memorymanagement.org/mmref/index.html#mmref-intro
http://events.linuxfoundation.org/sites/events/files/slides/slaballocators.pdf
http://iarchsys.com/?p=764
Questions ?

More Related Content

What's hot

Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBshimosawa
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)shimosawa
 
Memory management in Linux kernel
Memory management in Linux kernelMemory management in Linux kernel
Memory management in Linux kernelVadim Nikitin
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...Adrian Huang
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux KernelAdrian Huang
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Adrian Huang
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)shimosawa
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory modelSeongJae Park
 
Memory organization (Computer architecture)
Memory organization (Computer architecture)Memory organization (Computer architecture)
Memory organization (Computer architecture)Sandesh Jonchhe
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdfAdrian Huang
 
Advanced computer architechture -Memory Hierarchies and its Properties and Type
Advanced computer architechture -Memory Hierarchies and its Properties and TypeAdvanced computer architechture -Memory Hierarchies and its Properties and Type
Advanced computer architechture -Memory Hierarchies and its Properties and TypeLalfakawmaKh
 
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMIKernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMIAnne Nicolas
 

What's hot (20)

Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
 
Memory management in Linux kernel
Memory management in Linux kernelMemory management in Linux kernel
Memory management in Linux kernel
 
Overlayfs and VFS
Overlayfs and VFSOverlayfs and VFS
Overlayfs and VFS
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
 
U-Boot - An universal bootloader
U-Boot - An universal bootloader U-Boot - An universal bootloader
U-Boot - An universal bootloader
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
Kernel Debugging & Profiling
Kernel Debugging & ProfilingKernel Debugging & Profiling
Kernel Debugging & Profiling
 
Cache memory
Cache memory Cache memory
Cache memory
 
Linux dma engine
Linux dma engineLinux dma engine
Linux dma engine
 
spinlock.pdf
spinlock.pdfspinlock.pdf
spinlock.pdf
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 
Memory organization (Computer architecture)
Memory organization (Computer architecture)Memory organization (Computer architecture)
Memory organization (Computer architecture)
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
 
Advanced computer architechture -Memory Hierarchies and its Properties and Type
Advanced computer architechture -Memory Hierarchies and its Properties and TypeAdvanced computer architechture -Memory Hierarchies and its Properties and Type
Advanced computer architechture -Memory Hierarchies and its Properties and Type
 
BeagleBone Black Bootloaders
BeagleBone Black BootloadersBeagleBone Black Bootloaders
BeagleBone Black Bootloaders
 
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMIKernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
 
USB Drivers
USB DriversUSB Drivers
USB Drivers
 

Similar to Here are the key points about the Linux buddy allocator:- It manages physical memory at a page level (typically 4KB pages)- Pages are grouped into blocks of a size that is a power of two (e.g. 2MB, 4MB) - Blocks can be split in half recursively to serve allocation requests of different sizes- Each block has a "buddy" that it was split from or that was created by splitting it- To allocate a page, the allocator searches for a free block of the required size - If none exists, it coalesces free pages from smaller buddies until the size is met- Fragmentation is reduced by always splitting the largest free

Chapter5 the memory-system-jntuworld
Chapter5 the memory-system-jntuworldChapter5 the memory-system-jntuworld
Chapter5 the memory-system-jntuworldPraveen Kumar
 
Computer organization memory
Computer organization memoryComputer organization memory
Computer organization memoryDeepak John
 
memeoryorganization PPT for organization of memories
memeoryorganization PPT for organization of memoriesmemeoryorganization PPT for organization of memories
memeoryorganization PPT for organization of memoriesGauravDaware2
 
Linux memory-management-kamal
Linux memory-management-kamalLinux memory-management-kamal
Linux memory-management-kamalKamal Maiti
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Hsien-Hsin Sean Lee, Ph.D.
 
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDLIRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDLIRJET Journal
 
Computer Organisation and Architecture
Computer Organisation and ArchitectureComputer Organisation and Architecture
Computer Organisation and ArchitectureSubhasis Dash
 
Memory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer OrganizationMemory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer Organization2022002857mbit
 
COMPUTER ORGANIZATION NOTES Unit 5
COMPUTER ORGANIZATION NOTES Unit 5COMPUTER ORGANIZATION NOTES Unit 5
COMPUTER ORGANIZATION NOTES Unit 5Dr.MAYA NAYAK
 
Microelectronics U4.pptx.ppt
Microelectronics U4.pptx.pptMicroelectronics U4.pptx.ppt
Microelectronics U4.pptx.pptPavikaSharma3
 
Kiến trúc máy tính - COE 301 - Memory.ppt
Kiến trúc máy tính - COE 301 - Memory.pptKiến trúc máy tính - COE 301 - Memory.ppt
Kiến trúc máy tính - COE 301 - Memory.pptTriTrang4
 
ARM architcture
ARM architcture ARM architcture
ARM architcture Hossam Adel
 

Similar to Here are the key points about the Linux buddy allocator:- It manages physical memory at a page level (typically 4KB pages)- Pages are grouped into blocks of a size that is a power of two (e.g. 2MB, 4MB) - Blocks can be split in half recursively to serve allocation requests of different sizes- Each block has a "buddy" that it was split from or that was created by splitting it- To allocate a page, the allocator searches for a free block of the required size - If none exists, it coalesces free pages from smaller buddies until the size is met- Fragmentation is reduced by always splitting the largest free (20)

Chapter5 the memory-system-jntuworld
Chapter5 the memory-system-jntuworldChapter5 the memory-system-jntuworld
Chapter5 the memory-system-jntuworld
 
Computer organization memory
Computer organization memoryComputer organization memory
Computer organization memory
 
memeoryorganization PPT for organization of memories
memeoryorganization PPT for organization of memoriesmemeoryorganization PPT for organization of memories
memeoryorganization PPT for organization of memories
 
Linux memory-management-kamal
Linux memory-management-kamalLinux memory-management-kamal
Linux memory-management-kamal
 
Memory (Part 3)
Memory (Part 3)Memory (Part 3)
Memory (Part 3)
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
 
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDLIRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
 
Computer Organisation and Architecture
Computer Organisation and ArchitectureComputer Organisation and Architecture
Computer Organisation and Architecture
 
L05 parallel
L05 parallelL05 parallel
L05 parallel
 
Memory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer OrganizationMemory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer Organization
 
COMPUTER ORGANIZATION NOTES Unit 5
COMPUTER ORGANIZATION NOTES Unit 5COMPUTER ORGANIZATION NOTES Unit 5
COMPUTER ORGANIZATION NOTES Unit 5
 
05 Internal Memory
05  Internal  Memory05  Internal  Memory
05 Internal Memory
 
Microelectronics U4.pptx.ppt
Microelectronics U4.pptx.pptMicroelectronics U4.pptx.ppt
Microelectronics U4.pptx.ppt
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
Memoryhierarchy
MemoryhierarchyMemoryhierarchy
Memoryhierarchy
 
Dd sdram
Dd sdramDd sdram
Dd sdram
 
Kiến trúc máy tính - COE 301 - Memory.ppt
Kiến trúc máy tính - COE 301 - Memory.pptKiến trúc máy tính - COE 301 - Memory.ppt
Kiến trúc máy tính - COE 301 - Memory.ppt
 
ARM architcture
ARM architcture ARM architcture
ARM architcture
 
Coa presentation3
Coa presentation3Coa presentation3
Coa presentation3
 
DDR DIMM Design
DDR DIMM DesignDDR DIMM Design
DDR DIMM Design
 

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Here are the key points about the Linux buddy allocator:- It manages physical memory at a page level (typically 4KB pages)- Pages are grouped into blocks of a size that is a power of two (e.g. 2MB, 4MB) - Blocks can be split in half recursively to serve allocation requests of different sizes- Each block has a "buddy" that it was split from or that was created by splitting it- To allocate a page, the allocator searches for a free block of the required size - If none exists, it coalesces free pages from smaller buddies until the size is met- Fragmentation is reduced by always splitting the largest free

  • 1. Memory Management From Silicon to Algorithm Sysadmin #7 Adrien Mahieux - Sysadmin & microsecond hunter gh: github.com/Saruspete tw: @Saruspete em: adrien.mahieux@gmail.com
  • 2. 1) HW : Data Storage 2) HW : Data access 3) HW : Data processing 4) SW : Linux Internals 5) SW : Application allocator 6) SEC : Attacks Agenda {S,D,V}RAM, Bank, Rank, {u,r,lr,fb}dimm CAS Timing, ECC {S,D,Q}DR, NUMA, Channels, CPU (Cache, Associativity CPU Pipeline, Branch prediction, MMU Zones, Buddy allocator, fragmentation, sl[auo]b Stack / heap, regions, memory allocator.. Malloc implementation, ptmalloc2 details
  • 3. Where are we ? How long ? How often ? What ratio ? What path ?
  • 4. Hardware : Data Storage
  • 5. HW : Data Storage - Hierarchy DRAM Chip x4 = 16 Banks x8 = 8 Banks x16 = 4 Banks Bank Array Row Decoder Row Buffer Column Decoder Array Cell 1 bit
  • 6. HW : Data Storage - {S,D,V}RAM RAM Random Access Memory SRAM Static RAM DRAM Dynamic RAM VRAM Video RAM SRAM DRAM SRAM DRAM Speed CMOS 2T Cond. Power Consumption Constant Low + burst Production Cost Expensive Cheap Production complexity 5 trans. 1 trans. Read Operation Stable Destructive
  • 7. HW : Data Storage - DRAM Refresh DRAM must be refreshed, even if not accessed. Done every 4 - 64ms Refresh can be done by : - Burst refresh : Stop all operation, refresh all memory - Distributed refresh : refresh one (or more) row at a time - Self-Refresh (low-power mode) : Turn off memory controller, and refresh capacitor itself
  • 8. HW : Data Storage - Bank - Row Buffer holds read data - Read gets entire row into the buffer - Once read, capacitor is empty, but value in buffer - Write bits back before doing another read Process is called “opening” or “closing” a row 0 1 0 1 0 0 1 0 0 0 1 1 1 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  • 9. HW : Data Storage - Rank Rank : Set of DRAM Modules on a DIMM connected to the same Chip-Select pin (accessed simultaneously) A Rank has 64 bit wide data bus (72 on DIMM with ECC) and noted <rank>Rx<dram-width>: - 1Rx16 : Single Rank, 16bits width (4 DRAM to have 64bits) - 2Rx8 : Dual Rank, 8bits width (8 DRAM to have 64bits) - 4Rx8 : Quad Rank, 8bits width (8 DRAM to have 64bits)
  • 10. DIMM Dual Inline Memory Module ( != SIMM) SO-DIMM Small Outline DIMM (for laptop & embedded) UDIMM Unregistered DIMM (standard end-user) RDIMM Registered DIMM or Buffered DIMM (most servers). FB-DIMM Fully Buffered DIMM : Buffer for Addr & Data bus. LR-DIMM Load-Reduced DIMM : Like FB, but only serialize data, not address ECC Error Correcting Code : Parity value checking (like RAID5) HW : Data Storage - {U,R,LR,FB}DIMM
  • 11. HW : Data Storage - {U,R,LR,FB}DIMM Latency Max Size (1 DIMM) @ Bus Data Bus Implementation Details UDIMM Low 8 GB Parallel Parallel Input command and output data bus are directly connected to the bus. RDIMM UDIMM + 1 cycle 32 GB Parallel Parallel Same as UDIMM, but input commands are stabilized through a register (cost of 1 cycle) FB-DIMM 8 GB Serial Serial Add a big buffer for both command and data. But serial implementation generates hyperfrequency and signal stability issues. LR-DIMM Similiar to RDIMM 128 GB Parallel Parallel Fix issues of FBDIMM. Based on RDIMM and also buffers data-lines.
  • 12. HW : Data Storage - RAM Timings Row and Column addresses are sent on the same (address) bus. Multiplexer on the memory DIMM Notation : w-x-y-z-T - w : CAS Latency (CL) - x : RAS to CAS delay (TRCD) - y : RAS precharge (TRP) - z : Active to Precharge delay (TRAS) - T : Command Rate Timings are in cycles CL : Column select → Data avail. on bus TRCD : Row select → Column select TRP : New line activation (opening) TRAS : Line deactivation (closing) T : Between 2 commands
  • 13. TRRD RAS to RAS. Time to activate the next bank of memory. TWTR Write to Read. Between write command and the next read command. TWR Write Recovery. Time after a valid write operation and precharge. TRFC Row Refresh Cycle. to refresh a row on a memory bank. TRTP Read To Precharge. between a read command to a row pre-charge command to the same rank. TRTW Read to Write Delay. When a write command is received, cycles for the command to be executed. TRC Row Cycle. The minimum time in cycles it takes a row to complete a full cycle. This can be determined by; tRC = tRAS + tRP. HW : Data Storage - RAM Timings (advanced)
  • 14. TREF before a charge is refreshed so it does not lose its charge and corrupt. TWCL Write CAS number. Write to whatever bank is open to be written too. CPC Command Per Clock. chip select is executed then commands are issued.TRD Static tREAD. HW : Data Storage - RAM Timings (advanced)
  • 15. HW : Data Storage - DIMM Assembly x8 ⇒ Each DRAM outputs 8 bits
  • 16. HW : Data Storage - 3D X-Point Public name : Optane - Low Latency (< 10us) - High Density - No memory controller - Voltage Variation
  • 17. Hardware : Data Access
  • 18. HW : Data Access - {S,D,Q}DR The quantity of usable data handled by the memory for every Clock Cycle. Introduced the concept of “Tps” : Transfers per second Original DRAM : specify RAS + CAS for every operation. FPM (1990) : multiple reads from the same row without RAS. EDO (1995): allows to select next column while reading old one SDR (1997) : single selection then burst following DDR (2000) : transfers data on both rising and falling edge of the clock DDR2 (2003) : 2 internal channels DDR3 (2007) : Doubled transfer speed DDR4 (2013) : Increased frequency DDR5 (2020) : JEDEC released specs…
  • 19. HW : Data Access - UMA / NUMA Uniform Memory Access Central Northbridge Non Uniform Memory Access One MMU for each Socket
  • 20. HW : Data Access - Memory Channels DIMMs used in parallel to increase the bandwidth : single / dual / triple channel Channels must be balanced
  • 21. HW : Data Access - Direct Memory Access Bypass CPU processing - PCI-E - Thunderbolt - Firewire - Cardbus - Expresscard DMA controller advertise caches about RAM changes Direct Cache Access
  • 22. Hardware : Data Processing
  • 23. HW : Data Processing - CPU Pipeline
  • 24. HW : Data Processing - Cache DRAM is slow related to CPU cycles ⇒ Let’s use cache Can be used for read (prefetch) Write (write-back) Eviction done by tracking algorithm - LRU Least Recently Used - LFU Least Frequently Used - FIFO First In First Out - ARC Adaptive Replacement Cache Hit-Ratio gives usefulness of cache
  • 25. HW : Data Processing - Cache Distribution policy : - Fully Associative : All blocks checked simultaneously (heavy hardware) - Direct Mapped : fast but need balanced spread (rare) - Set-Associative : mix of 2 previous Address can be : - Virtual : Fast access but not unique. Used by L1 and TLB - Physical : Calculation needed but unique. Used for other caches Programmers : avoid mapping same @Phys on multiple @Virt
  • 26. HW : Data Processing - Cache Coherency Multiple CPUs ⇒ Multiple caches ⇒ SYNC MOESI (Modified Owned Exclusive Shared Invalid) on NUMA systems Processors use cache snooping Request For Ownership ⇒ Very costly When a CPU changes a data already in cache of another CPU
  • 27. HW : Data Processing - Writing Write policy on memory zones done by MTRR (Memory Type Range Register) Write Through : All data written in cache is also written in memory Write Back : Delay memory access as long as possible Write Combining : Force writes to be grouped in bulk Uncacheable : For some I/O and HW, like BIOS, ACPI, IOAPIC...
  • 28. HW : Data Processing - Memory Management Unit Switch Between @Virtual and @Physical ⇒ Translation done by CPU (MMU) Not directly mapped: To @ 16EB of 64bits, direct array would be huge ! We only use 48bits (up to 256TB) and Page Tables to avoid management waste. 4 cascading tables: Page {Global,Upper,Middle} Directory and Page Table
  • 29. HW : Data Processing - Memory Management Unit Page Walking : 1) @Base for L4 2) Add offset from bits 39-47 ⇒ Got @Base for L3 3) Add offset from bits 30-38 ⇒ Got @Base for L2 4) Add offset from bits 21-29 ⇒ Got @Base for L1 5) Add offset from bits 12-20 ⇒ Got @Base for Page 6) Add offset from bits 0-11 ⇒ @Physical
  • 30. HW : Data Processing - Memory Management Unit An empty Page Directory stores 512 (29 ) entries ⇒ 64bits * 512 = 4KB For 32KB (4 * 512) ⇒ @ 2MB For 2MB (3*512 + 512*512) ⇒ @ 1GB For 128MB (2*512 + 5123 ) ⇒ @ 550GB ⇒ Low overhead for storage... but requires 4 reads. ⇒ let’s cache these translation : Translation Lookaside Buffer (TLB) Limit TLB flush upon context switching by adding the page-table ID to the TLB entry
  • 31. Software : Linux Internals
  • 32. Name Size (x86) Size (x86_64) Description DMA < 16MB < 16MB For very old devices (@24 bits) DMA32 N/A 16 - 4096MB For devices addressing up to 32bits (4GB) NORMAL 16 - 896MB > 4096MB Memory directly mapped by Kernel HIGHMEM > 896MB N/A SW : Linux Internals - Zones
  • 33. SW : Linux Internals - Zones 32bits : 3/1 split (or 2/2 or 1/3) between Userspace & Kernel On these 1GB of kernel space, 128MB used to map higher pages. 1024 - 128 = 896 Low memory : directly addressable by Kernel High memory : must use the 128MB indirection table to be addressed 64bit : all space directly addressable by MMU
  • 34. SW : Linux Internals - Zones Jul 12 22:13:12 [server] kernel: swapper: page allocation failure. order:2, mode:0x4020 Jul 12 22:46:46 [server] kernel: [app_name]: page allocation failure. order:4, mode:0xd0 include/linux/gfp.h : Zone usage per flag * bit result * ================= * 0x0 => NORMAL * 0x1 => DMA or NORMAL * 0x2 => HIGHMEM or NORMAL * 0x3 => BAD (DMA+HIGHMEM) * 0x4 => DMA32 or DMA or NORMAL * 0x5 => BAD (DMA+DMA32) * 0x6 => BAD (HIGHMEM+DMA32) * 0x7 => BAD (HIGHMEM+DMA32+DMA) * 0x8 => NORMAL (MOVABLE+0) * 0x9 => DMA or NORMAL (MOVABLE+DMA) * 0xa => MOVABLE (Movable is valid only if HIGHMEM is set too) * 0xb => BAD (MOVABLE+HIGHMEM+DMA) * 0xc => DMA32 (MOVABLE+DMA32) * 0xd => BAD (MOVABLE+DMA32+DMA) * 0xe => BAD (MOVABLE+DMA32+HIGHMEM) * 0xf => BAD (MOVABLE+DMA32+HIGHMEM+DMA)
  • 35. SW : Linux Internals - Buddy Allocator Goal : limit External Fragmentation and Internal Fragmentation
  • 36. SW : Linux Internals - Buddy Allocator 4K pages are grouped by 29 (2MB) or 210 (4MB) blocks Blocks are then cut in half (2 buddies) to service the request Upon release, tries to merge buddy pages back together
  • 37. SW : Linux Internals - Buddy Allocator Unmovable : Locked in memory Reclaimable : Reusable after clean Movable : Immediately available Reserve : Last resort reserve Isolate : keep on local NUMA node CMA : Contiguous Memory Allocator, for DMA devices with large contiguous cat /proc/pagetypeinfo Page block order: 10 Pages per block: 1024 Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 Node 0, zone DMA, type Unmovable 1 1 0 0 2 1 1 0 1 1 0 Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 0 2 Node 0, zone DMA, type Reserve 0 0 0 0 0 0 0 0 0 0 1 Node 0, zone DMA32, type Unmovable 8 6 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Reclaimable 376 2817 5 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Movable 6323 12025 287 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Reserve 0 0 1 4 6 2 0 0 1 1 0 Node 0, zone Normal, type Unmovable 2611 137 0 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Reclaimable 33847 4321 144 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Movable 37312 9849 1097 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Reserve 0 0 5 1 0 1 1 2 1 0 1 Number of blocks type Unmovable Reclaimable Movable Reserve Node 0, zone DMA 1 0 2 1 Node 0, zone DMA32 13 18 796 1 When no space is available, buddy-allocator will call kswapd
  • 38. SW : Linux Internals - SLAB SLAB = Allocator for Kernel Objects Uses cache to avoid fragmentation Each kernel object is stored in a SLAB SLAB 1 Queue / NUMA Node SLUB 1 Queue / CPU SLOB As compact as possible Most servers use SLUB : - Defrag - Debug
  • 39. SW : Linux Internals - SLAB # grep -v '# name' /proc/slabinfo | tr -d '#' | column -t slabinfo - version: 2.1 [ . . . ] ext4_inode_cache 294150 294210 1072 30 8 : tunables 0 0 0 : slabdata 9807 9807 0 ext4_allocation_context 224 224 128 32 1 : tunables 0 0 0 : slabdata 7 7 0 ext4_io_end 1472 1472 64 64 1 : tunables 0 0 0 : slabdata 23 23 0 ext4_extent_status 143365 221442 40 102 1 : tunables 0 0 0 : slabdata 2171 2171 0 [ . . . ] inode_cache 13493 15596 584 28 4 : tunables 0 0 0 : slabdata 557 557 0 dentry 449376 450471 192 21 1 : tunables 0 0 0 : slabdata 21451 21451 0 buffer_head 650327 840294 104 39 1 : tunables 0 0 0 : slabdata 21546 21546 0 task_struct 1299 1336 7936 4 8 : tunables 0 0 0 : slabdata 334 334 0 cred_jar 88986 89271 192 21 1 : tunables 0 0 0 : slabdata 4251 4251 0 [ . . . ] task_struct (large object) Object size 7936 B 4 Objects / slab 31744 B 8 Pages / slab 32768 B Loss : 32768 - 31744 = 1024 B / slab 334 slabs 334 KB lost / 10688 KB (3.125%) ext4_extent_status (small and compact) Object size 40 B 102 obj / slab 4080 B 1 Page / slab 4096 B Loss : 4096 - 4080 = 16B /slab (0.39%) 2171 slabs 334 KB lost / 8892 KB ()
  • 40. cat /proc/meminfo MemTotal: 16328616 kB MemFree: 4021720 kB MemAvailable: 6653544 kB Buffers: 380220 kB Cached: 4688968 kB SwapCached: 0 kB Active: 7703764 kB Inactive: 3890964 kB Active(anon): 6546524 kB Inactive(anon): 2354396 kB Active(file): 1157240 kB Inactive(file): 1536568 kB Unevictable: 20988 kB Mlocked: 20988 kB SwapTotal: 8191996 kB SwapFree: 8191996 kB Dirty: 92 kB Writeback: 0 kB AnonPages: 6546580 kB Mapped: 1763640 kB Shmem: 2368152 kB Slab: 383576 kB SReclaimable: 280636 kB SUnreclaim: 102940 kB KernelStack: 19632 kB PageTables: 115868 kB [ ... ] SW : Linux Internals - /proc
  • 41. SW : Linux Internals - /proc pmap -X $PID # read /proc/$PID/smaps 21797: /bin/bash Address Perm Offset Device Inode Size Rss Pss Referenced Anonymous ShmemPmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss Locked Mapping 5576c6639000 r-xp 00000000 fd:04 268557 992 912 80 912 0 0 0 0 0 0 0 bash 5576c6930000 r--p 000f7000 fd:04 268557 16 16 16 16 16 0 0 0 0 0 0 bash 5576c6934000 rw-p 000fb000 fd:04 268557 36 36 36 36 36 0 0 0 0 0 0 bash 5576c693d000 rw-p 00000000 00:00 0 20 20 20 20 20 0 0 0 0 0 0 5576c806d000 rw-p 00000000 00:00 0 1508 1472 1472 1472 1472 0 0 0 0 0 0 [heap] 7f740b9ea000 r-xp 00000000 fd:04 267260 44 44 0 44 0 0 0 0 0 0 0 libnss_files-2.22.so 7f740b9f5000 ---p 0000b000 fd:04 267260 2044 0 0 0 0 0 0 0 0 0 0 libnss_files-2.22.so 7f740bbf4000 r--p 0000a000 fd:04 267260 4 4 4 4 4 0 0 0 0 0 0 libnss_files-2.22.so 7f740bbf5000 rw-p 0000b000 fd:04 267260 4 4 4 4 4 0 0 0 0 0 0 libnss_files-2.22.so 7f740bbf6000 rw-p 00000000 00:00 0 24 0 0 0 0 0 0 0 0 0 0 7f740bbfc000 r--p 00000000 fd:04 267226 109328 504 19 504 0 0 0 0 0 0 0 locale-archive 7f74126c0000 r-xp 00000000 fd:04 267234 1756 1536 10 1536 0 0 0 0 0 0 0 libc-2.22.so 7f7412877000 ---p 001b7000 fd:04 267234 2048 0 0 0 0 0 0 0 0 0 0 libc-2.22.so 7f7412a77000 r--p 001b7000 fd:04 267234 16 16 16 16 16 0 0 0 0 0 0 libc-2.22.so 7f7412a7b000 rw-p 001bb000 fd:04 267234 8 8 8 8 8 0 0 0 0 0 0 libc-2.22.so 7f7412a7d000 rw-p 00000000 00:00 0 16 12 12 12 12 0 0 0 0 0 0 7f7412a81000 r-xp 00000000 fd:04 267240 12 12 0 12 0 0 0 0 0 0 0 libdl-2.22.so 7f7412a84000 ---p 00003000 fd:04 267240 2044 0 0 0 0 0 0 0 0 0 0 libdl-2.22.so 7f7412c83000 r--p 00002000 fd:04 267240 4 4 4 4 4 0 0 0 0 0 0 libdl-2.22.so 7f7412c84000 rw-p 00003000 fd:04 267240 4 4 4 4 4 0 0 0 0 0 0 libdl-2.22.so 7f7412c85000 r-xp 00000000 fd:04 270666 152 152 12 152 0 0 0 0 0 0 0 libtinfo.so.5.9 7f7412cab000 ---p 00026000 fd:04 270666 2044 0 0 0 0 0 0 0 0 0 0 libtinfo.so.5.9 7f7412eaa000 r--p 00025000 fd:04 270666 16 16 16 16 16 0 0 0 0 0 0 libtinfo.so.5.9 7f7412eae000 rw-p 00029000 fd:04 270666 4 4 4 4 4 0 0 0 0 0 0 libtinfo.so.5.9 7f7412eaf000 r-xp 00000000 fd:04 267194 132 132 0 132 0 0 0 0 0 0 0 ld-2.22.so 7f74130a3000 rw-p 00000000 00:00 0 20 20 20 20 20 0 0 0 0 0 0 7f74130a8000 r--p 00000000 fd:04 1442509 124 64 9 64 0 0 0 0 0 0 0 bash.mo 7f74130c7000 r--s 00000000 fd:04 527264 28 28 0 28 0 0 0 0 0 0 0 gconv-modules.cache 7f74130ce000 rw-p 00000000 00:00 0 4 4 4 4 4 0 0 0 0 0 0 7f74130cf000 r--p 00020000 fd:04 267194 4 4 4 4 4 0 0 0 0 0 0 ld-2.22.so 7f74130d0000 rw-p 00021000 fd:04 267194 4 4 4 4 4 0 0 0 0 0 0 ld-2.22.so 7f74130d1000 rw-p 00000000 00:00 0 4 4 4 4 4 0 0 0 0 0 0 7ffc4fbbe000 rw-p 00000000 00:00 0 136 32 32 32 32 0 0 0 0 0 0 [stack] 7ffc4fbe8000 r--p 00000000 00:00 0 8 0 0 0 0 0 0 0 0 0 0 [vvar] 7ffc4fbea000 r-xp 00000000 00:00 0 8 4 0 4 0 0 0 0 0 0 0 [vdso] ffffffffff600000 r-xp 00000000 00:00 0 4 0 0 0 0 0 0 0 0 0 0 [vsyscall] ====== ==== ==== ========== ========= ============== ============== =============== ==== ======= ====== 122620 5072 1814 5072 1684 0 0 0 0 0 0 KB
  • 42. Software : Application Memory Allocator
  • 43. SW : Application Memory Allocator
  • 44. SW : Application Memory Allocator
  • 45. SW : Application Memory Allocator malloc NOT A SYSCALL (mmap + brk) Speed & minimal fragmentation Multiple implementations : - dlmalloc Doug-Lea (Generic) - ptmalloc2 Current glibc - jemalloc Jason Evans (FreeBSD, Firefox, FB) - tcmalloc Thread-Caching (Google) - libumem Solaris - €€€€€ lockless, hoard, smartheap...
  • 46. SW : Application Memory Allocator - ptmalloc2 Uses brk or mmap to do allocation - brk / sbrk for main thread and if req < 128KB - mmap otherwise. Maintains arenas : main & thread Each arena is composed of shards Upon free() ptmalloc adds freed region to a “bin” to be reused for later allocations: - Fast 16 - 80 bytes - Unsorted No size limit. Latest freed. - Small < 512 bytes - Large >= 512 bytes
  • 47. SW : Application Memory Allocator - ptmalloc2 Internal Structures : - malloc_state : Arena Header. Has multiple heaps, except for Main Arena (which just grows its heap) - heap_info : Heap Header. Has multiple chunks - malloc_chunk : Chunk Header. Result of malloc()
  • 49. Security : Some Attacks Rowhammer Bitflip on DRAM Rows Evil Maid DMA using physical access Stack Clash Huge stack usage to overlap on heap
  • 51. Bibliography - Hardware https://compas.cs.stonybrook.edu/~nhonarmand/courses/sp15/cse502/slides/06-main_mem.pdf http://www.eng.utah.edu/~cs7810/pres/11-7810-12.pdf http://slideplayer.fr/slide/3279682/ https://forums.tweaktown.com/gigabyte/27283-memory-timings-explained-suggested-timings-memset-vs-bios.html http://www.masterslair.com/memory-ram-timings-latency-cas-ras-tcl-trcd-trp-tras https://www.slideshare.net/abhilash128/lec-21-16642228 https://en.wikichip.org http://www.overclockingmadeinfrance.com/quest-ce-que-les-cas-latency/ https://stackoverflow.com/questions/29522431/does-a-branch-misprediction-flush-the-entire-pipeline-even-for-very-short-if-st https://kshitizdange.github.io/418CacheSim/final-report http://www.lifl.fr/~marquet/cnl/ssam/ssam-c3.pdf http://www.toves.org/books/cache/ http://www.simmtester.com/page/news/showpubnews.asp?num=168 http://wiki.osdev.org/X86-64