Page Frame Management
Page Descriptors
The kernel must keep track of the status of each page frame.
/*
* Each physical page in the system has a struct page associated with
* it to keep track of whatever it is we are using the page for at the
* moment. Note that we have no way to track which tasks are using
* a page, though if it is a pagecache page, rmap structures can tell us
* who is mapping it.
*/
All page descriptors are stored in mem_map array.
Several field of the page descriptor have different meanings, according to whether the page frame is free or what kernel component is using the page frame.
Flags describing the status of a page frame
Extended Paging:
Extended Paging allows page frames to be 4MB instead of 4KB in size; it is mainly used to translate large contiguous linear address ranges into corresponding physical ones.
NUMA
None Uniform Memory Access
Linux kernel supports NUMA by minimizing the number of accesses to costly nodes and selecting carefully where the data structures accessed by the CPU most often are stored.
Node descriptor ---- pg_data_t
The physical memory inside each node can be split into several zones.
Memory Zones
Hardware constraints may limit the way page frames can be used.
1. DMA processors for old ISA buses have a strong limit that they are able to address only the first 16M bytes of the RAM.
2. The CPU may not be able to directly access the physical address because the linear address space is too small.
To cope with these two limitations, the Linux 2.6 partitions the physical memory of every node into three zones:
1. ZONE_DMA (contain page frames lower than 16MB)
2. ZONE_NORMAL (16MB – 896MB)
3. ZONE_HIGHMEM ( > 896MB, always empty on 64-bit architecture)
Type --- zone
The Pool of Reserved Page Frames
Some Kernel should never block while requesting memory. This happens, for instance, when handling interrupts or when executing codes in a critical section.
Atomic memory allocation requests
The kernel reserves a pool of page frames for atomic memory allocation requests to be used only on low-on-memory conditions.
The Zoned Page Frame Allocator
Kernel Mappings of High-Memory Page Frames
Eg. __get_free_pages(GFP_HIGHMEM,0)
This function returns the linear address of the assigned page frame. However, pages in high-memory are not mapped into the kernel’s linear address space. Therefore, this function not only returns NULL, but also causes a severe problem that the page frame cannot be released by kernel because the kernel has lost track of it.
On 32-bit platforms, Linux adopts the following approach:
1. The allocation of high-memory page frames is only done through alloc_pages().
2. The last 128M linear address is dedicated to mapping high-memory page frames. The mapping is temporary and the linear address space is recycled.
Three mechanisms to map high-memory page frames:
1. Permanent kernel mapping
2. Temporary kernel mapping
3. Noncontiguous memory allocation
Permanent Kernel Mapping
To establish long-lasting mappings of high-memory page frames into kernel address space, we use a dedicated page table called pkmap_page_table.
To keep track of the association between high-memory page frames and linear addresses, the kernel makes use of a hash table called page_address_htable which consists of entries of type page_address_map.
/*
* Describes one page->virtual association
*/
struct page_address_map {
struct page *page;
void *virtual;
struct list_head list;
};
The kmap() function establish a permanent kernel mapping.
Temporary Kernel Mapping
Requesting a temporary kernel mapping never blocks the current process, thus it could be invoked inside an interrupt handler or a deferrable function.
enum km_type {
KMAP_D(0) KM_BOUNCE_READ,
KMAP_D(1) KM_SKB_SUNRPC_DATA,
KMAP_D(2) KM_SKB_DATA_SOFTIRQ,
KMAP_D(3) KM_USER0,
KMAP_D(4) KM_USER1,
KMAP_D(5) KM_BIO_SRC_IRQ,
KMAP_D(6) KM_BIO_DST_IRQ,
KMAP_D(7) KM_PTE0,
KMAP_D(8) KM_PTE1,
KMAP_D(9) KM_IRQ0,
KMAP_D(10) KM_IRQ1,
KMAP_D(11) KM_SOFTIRQ0,
KMAP_D(12) KM_SOFTIRQ1,
KMAP_D(13) KM_SYNC_ICACHE,
KMAP_D(14) KM_SYNC_DCACHE,
/* UML specific, for copy_*_user - used in do_op_one_page */
KMAP_D(15) KM_UML_USERCOPY,
KMAP_D(16) KM_IRQ_PTE,
KMAP_D(17) KM_NMI,
KMAP_D(18) KM_NMI_PTE,
KMAP_D(19) KM_KDB,
/*
* Remember to update debug_kmap_atomic() when adding new kmap types!
*/
KMAP_D(20) KM_TYPE_NR
};
Index of a fix-mapped linear address.
enum fixed_addresses {
#define FIX_N_COLOURS 8
FIX_CMAP_BEGIN,
#ifdef CONFIG_MIPS_MT_SMTC
FIX_CMAP_END = FIX_CMAP_BEGIN + (FIX_N_COLOURS * NR_CPUS * 2),
#else
FIX_CMAP_END = FIX_CMAP_BEGIN + (FIX_N_COLOURS * 2),
#endif
#ifdef CONFIG_HIGHMEM
/* reserved pte's for temporary kernel mappings */
FIX_KMAP_BEGIN = FIX_CMAP_END + 1,
FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1,
#endif
__end_of_fixed_addresses
};
kmap_atomic()
knumap_atomic()
The Buddy System Algorithm
External fragmentation
Related Topics
Fragmentation:
è External fragmentation
è Internal fragmentation
è Data fragmentation
Fragmentation can be accepted in return for increase in speed or simplicity.
External fragmentation does the most damage to system.
Two ways to avoid external fragmentation
1. Use the paging circuitry to map groups of noncontiguous page frames into intervals of contiguous linear address. This requires modifying Page Table frequently, resulting in higher memory access times.
2. Develop a suitable technique to keep track of the existing free contiguous page frames, avoiding as much as possible the need to split up a large free block.
Memory Area Management
The Slab Allocator
The slab allocator groups objects into caches. Each cache is a “store” of objects of the same type.
Cache Descriptor
struct kmem_cache {
/*cache flags*/
/*slab size*/
/*object size*/
/*number of objects per slab*/
struct kmem_list3 *nodelists[MAX_NUMNODES];
}
/*
* The slab lists for all objects.
*/
struct kmem_list3 {
struct list_head slabs_partial; /* partial list first, better asm code */
struct list_head slabs_full;
struct list_head slabs_free;
unsigned long free_objects;
spinlock_t list_lock;
};
Slab Descriptor
kmem_cache_create
kmem_cache_destroy
/proc/slabinfo
Interfacing the slab allocator with the zoned page frame allocator
Kmem_getpages()
Kmem_freepages()
These two functions both rely on the zoned page frame allocator to obtain or release a group of contiguous page frames.
Allocating a Slab to a Cache
è A request has been issued to allocate a new object.
è The cache does not include a free object.
Releasing a Slab from a Cache
è There’re too many free objects in the slab cache.
è The timer invoked periodically determines that there are fully unused slabs that can be released.
Object Descriptor
/*
* kmem_bufctl_t:
*
* Bufctl's are used for linking objs within a slab
* linked offsets.
*
* This implementation relies on "struct page" for locating the cache &
* slab an object belongs to.
* This allows the bufctl structure to be small (one int), but limits
* the number of objects a slab (not a cache) can contain when off-slab
* bufctls are used. The limit is the size of the largest general cache
* that does not use off-slab slabs.
* For 32bit archs with 4 kB pages, is this 56.
* This is not serious, as it is only for large objects, when it is unwise
* to have too many per slab.
* Note: This limit can be raised by introducing a general cache whose size
* is less than 512 (PAGE_SIZE<<3), but greater than 256.
*/
typedef unsigned int kmem_bufctl_t;
#define BUFCTL_END (((kmem_bufctl_t)(~0U))-0)
#define BUFCTL_FREE (((kmem_bufctl_t)(~0U))-1)
#define BUFCTL_ACTIVE (((kmem_bufctl_t)(~0U))-2)
#define SLAB_LIMIT (((kmem_bufctl_t)(~0U))-3)
Slab descriptor and object descriptors are stored in two ways:
External: in the general cache
Internal: inside the slab
Slab Coloring
Goal: try to achieve the best possible performance of the microprocessor’s hardware cache
Policy: assign arbitrary numbers called colors to slabs -> change the offset of the first object in the slab
Memory Pools
Difference between memory pools and reserved page frames
Reserved page frames: only used to satisfy an atomic memory allocation request
Memory pools: allocate some dynamic memory to be used only on low-on-memory emergency
Type: mempool_t
Questions
1. Which policies does Linux adopt to avoid external and internal fragmentation?
External fragmentation: Buddy System Algorithm
Internal fragmentation: Slab allocator schema