Driver development – memory
• Physical memory and virtual memory
• Virtual memory organization
• Physical and virtual memory mapping
• Accessing physical memory
• Allocators in kernel memory
• Kmalloc allocator and APIs
• Vmalloc allocator and APIs
Physical and virtual address
Physical address space
Virtual address space
All processes have their own
virtual address space , and run as
if they had access to the whole
• Physical memory is storage hardware that records data with low latency
and small granularity.
• Physical memory addresses are numbers sent across a memory bus to
identify the specific memory cell within a piece of storage hardware
associated with a given read or write operation.
• Examples of storage hardware providing physical memory are DIMMs
(DRAM), SD memory cards (flash), video cards (frame buffers and texture
memory), and so on.
• Only the kernel uses physical memory addresses directly.
• User space programs exclusively use virtual addresses.
• Virtual memory provides a software-controlled set of memory addresses,
allowing each process to have its own unique view of a computer's
• Virtual addresses only make sense within a given context, such as a
specific process. The same virtual address can simultaneously mean
different things in different contexts.
• Virtual addresses are the size of a CPU register. On 32 bit systems each
process has 4 gigabytes of virtual address space all to itself, which is often
more memory than the system actually has.
• Virtual addresses are interpreted by a processor's Memory Management
Unit (mmu), using data structures called page tables which map virtual
address ranges to associated content.
• Virtual memory is used to implement allocation, swapping, file mapping,
copy on write shared memory, defragmentation, and more.
Memory management Unit (MMU)
• The memory management unit is the part of the CPU that interprets
• Attempts to read, write, or execute memory at virtual addresses are
either translated to corresponding physical addresses, or else generate an
interrupt (page fault) to allow software to respond to the attempted
• This gives each process its own virtual memory address range, which is
limited only by address space (4 gigabytes on most 32-bit system), while
physical memory is limited by the amount of available storage hardware.
• Physical memory addresses are unique in the system, virtual memory
addresses are unique per-process.
• Page tables are data structures which contains a process's list of memory
mappings and track associated resources.
• Each process has its own set of page tables, and the kernel also has a few
page table entries for things like disk cache.
• 32-bit Linux systems use three-level tree structures to record page tables.
The levels are the Page Upper Directory (PUD), Page Middle Directory
(PMD), and Page Table Entry (PTE).
• 64-bit Linux can use 4-level page tables.
• The CPU cache is a very small amount of very fast memory built into a
processor, containing temporary copies of data to reduce processing
• The L1 cache is a tiny amount of memory (generally between 1k and 64k)
wired directly into the processor that can be accessed in a single clock
• The L2 cache is a larger amount of memory (up to several megabytes)
adjacent to the processor, which can be accessed in a small number of
• Access to un-cached memory (across the memory bus) can take dozens,
hundreds, or even thousands of clock cycles.
Translation look–aside buffer (TLB)
• The TLB is a small fixed-size array of recently used pages, which the CPU
checks on each memory access.
• It lists a few of the virtual address ranges to which physical pages are
• The TLB is a cache for the MMU.
• Accesses to virtual addresses listed in the TLB go directly through to the
associated physical memory
• Accesses to virtual addresses not listed in the TLB (a "TLB miss") trigger a
page table lookup, which is performed either by hardware, or by the page
fault handler, depending on processor type.
Kernel memory - pages
• The kernel treats physical pages as the basic unit of memory
• Although the processor’s smallest addressable unit is a byte or a word, the
memory management unit typically deals in pages.
• In terms of virtual memory, pages are the smallest unit that matters.
• Most 32-bit architectures have 4KB pages, whereas most 64-bit
architectures have 8KB pages.
• This implies that on a machine with 4KB pages and 1GB of memory,
physical memory is divided into 262,144 distinct pages.
• The kernel memory manager also handles smaller memory (less than page
size) allocation using the slabs/SLUB allocator.
• Kernel allocated pages cannot be swapped. They always remain in
• Not all memory is equally addressable
• Diﬀerent types of memory have to be used for diﬀerent things
• Linux uses diﬀerent zones to handle this
– ZONE DMA: Some older I/O devices can only address memory up to
– ZONE NORMAL: Regular memory up to 896M
– ZONE HIGHMEM: Memory above 896M
Virtual memory organization:
• 1GB reserved for kernel-space
• Contains kernel code and core data structures
identical in all address spaces
• Most memory can be a direct mapping of
physical memory at a fixed offset
• Complete 3GB exclusive mapping available for
each user-space process
• Process code and data (program, stack, …)
• Memory-mapped files, not necessarily
mapped to physical memory
Page allocators in the kernel
Some kernel Code
Allows to create caches, each cache
storing objects of the same size.
Allows to allocate contiguous areas of physical pages
(4K, 8K, 16K , etc.)
• Suitable for data larger than page size for e.g. 4K s
• The kernel represents every physical page on the system with the ‘struct
page’ data structure, defined in linux/mm_types.h
• The kernel use this data structure to keep track of all pages in the system,
because the kernel needs to know whether the page is free (i.e. page is
• The allocated area is virtually contiguous but also physically contiguous. It
is allocated in the identity-mapped part of the kernel memory space.
• This means that large areas may not be available or hard to retrieve due
to physical memory fragmentation.
• The kernel provides one low-level mechanism for requesting memory,
along with several interfaces to access it.
• All these interfaces allocate memory with page-size granularity and are
declared in linux/gfp.h.
• The core function is
struct page* alloc_pages(gfp_t gfp_mask, unsigned int order);
• This allocates 2^order (i.e. 1<<order) contiguous physical pages
• On success, returns a pointer to the first page’s page structure
• On error, returns NULL
• To get logical address from the page pointer
void *page_address(struct page *page);
• This returns a pointer to the logical address where the given physical page
• If you don’t need the actual struct page, you can call
unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int
• This function works the same as alloc_pages(), except that it directly
returns the logical address of the first requested page.
• To allocate single page
struct page * alloc_page(gfp_t gfp_mask);
unsigned long __get_free_page(gfp_t gfp_mask);
• A family of functions enables you to free allocated pages when you no
longer need them:
void __free_pages(struct page *page, unsigned int order)
void free_pages(unsigned long addr, unsigned int order)
void free_page(unsigned long addr)
• You must be careful to free only pages you allocate.
• Passing the wrong struct page or address, or the incorrect order, can
result in corruption.
Page allocator flags
• Standard kernel memory allocation. The allocation may block in order
to find enough available memory. Fine for most needs, except in
interrupt handler context.
• RAM allocated from code which is not allowed to block (interrupt
handlers or critical sections). Never blocks, allows to access
emergency pools, but can fail if no free memory is readily available.
• Allocates memory in an area of the physical memory usable for DMA
• Others are defined in include/linux/gfp.h
• (GFP: __get_free_pages).
• There are certain kinds of data structures that are frequently allocated
• Instead of constantly asking the kernel memory allocator for such pieces,
they’re allocated in groups and freed to per-type linked lists.
• To allocate such an object, check the linked list; only if it’s empty is the
generic memory allocator called.
• The object size can be smaller or greater than the page size
• To free such an item, just put it back on the list.
• If a set of free objects constitute an entire page, it can be reclaimed if
• The SLAB allocator takes care of growing or reducing the size of the cache
as needed, depending on the number of allocated objects. It uses the
page allocator to allocate and free pages.
• SLAB caches are used for data structures that are present in many
instances in the kernel: directory entries, file objects, network packet
descriptors, process descriptors, etc.
• See /proc/slabinfo
• They are rarely used for individual drivers.
• See include/linux/slab.h for the API
• The kmalloc() function is a simple interface for obtaining kernel memory
in byte-sized chunks. If you need whole pages, the previously discussed
interfaces might be a better choice.
• The kmalloc allocator is the general purpose memory allocator in the
Linux kernel, for objects from 8 bytes to 128 KB
• The allocated area is guaranteed to be physically contiguous
• The allocated area size is rounded up to the next power of two size
• The kmalloc() function’s operation is similar to that of user-space’s
familiar malloc() routine, with the exception of the additional flags
• It uses the same flags as the page allocator (gfp_t and gfp_mask) with the
• It should be used as the primary allocator unless there is a strong reason
to use another one.
• #include <linux/slab.h>
void *kmalloc(size_t size, int flags);
• Allocate size bytes, and return a pointer to the area (virtual address)
• size: number of bytes to allocate
• flags: same flags as the page allocator
void *kzalloc(size_t size, gfp_t flags);
• Allocates a zero-initialized buffer
void kfree (const void *ptr);
• Free an allocated area
• The vmalloc() function works in a similar fashion to kmalloc(), except it
allocates memory that is only virtually contiguous and not necessarily
• This is how a user-space allocation function works.
• The pages returned by malloc() are contiguous within the virtual address
space of the processor, but there is no guarantee that they are actually
contiguous in physical RAM.
• The kmalloc() function guarantees that the pages are physically
contiguous (and virtually contiguous).
• The vmalloc() function ensures only that the pages are contiguous within
the virtual address space.
• It does this by allocating potentially non-contiguous chunks of physical
memory and “fixing up” the page tables to map the memory into a
contiguous chunk of the logical address space.
• Mostly hardware devices require physically contiguous memory
• Any regions of memory that hardware devices work with must exist as a
physically contiguous block and not merely a virtually contiguous one.
• Blocks of memory used only by software— for example, process-related
buffers—are fine using memory that is only virtually contiguous.
• In your programming, you never know the difference.
• All memory appears to the kernel as logically contiguous.
• #include <linux/vmalloc.h>
void *vmalloc(unsigned long size);
• On success, returns pointer to virtually contiguous memory
• On error, returns NULL
• Void vfree(const void *ptr)
• Frees the block of memory beginning at ‘ptr’ that was previously allocated
Picking an allocation method
• If you need contiguous physical pages, use one of the low-level page
allocators or kmalloc().
• The two most common flags given to these functions are GFP_ATOMIC
• Specify the GFP_ATOMIC flag to perform a high priority allocation that
will not sleep. This is a requirement of interrupt handlers and other pieces
of code that cannot sleep.
• Code that can sleep, such as process context code , should use
GFP_KERNEL. This flag specifies an allocation that can sleep, if needed, to
obtain the requested memory.
• If you do not need physically contiguous pages—only virtually contiguous