Anyone who has used or learned C is no stranger to malloc. Everyone knows that malloc can allocate a contiguous memory space and can be freed by free when it's no longer needed. However, many programmers are not familiar with the inner workings of malloc, and some even consider it a system call or a keyword provided by the operating system. In reality, malloc is just a standard library function in C, and its basic implementation is not complicated. Any programmer with a basic understanding of C and the operating system can easily grasp it.
This article explains the mechanism behind malloc by implementing a simple version of it. Although this implementation is not as efficient as existing C standard library implementations (like glibc), it is much simpler than real-world implementations, making it easier to understand. What’s important is that this implementation follows the same principles as actual implementations.
The article will first introduce some fundamental concepts, such as how the operating system manages memory for processes and related system calls. Then, it will gradually build a simple version of malloc. For simplicity, the discussion will focus on the x86_64 architecture and the Linux operating system.
1. What is malloc
2. Preliminary Knowledge
2.1 Linux Memory Management
2.1.1 Virtual Memory Address and Physical Memory Address
2.1.2 Page and Address Composition
2.1.3 Memory Pages and Disk Pages
2.2 Linux Process Level Memory Management
2.2.1 Memory Layout
2.2.2 Heap Memory Model
2.2.3 brk and sbrk
2.2.4 Resource Limit and rlimit
3. Implementing Malloc
3.1 Toy Implementation
3.2 Formal Implementation
3.3 Legacy Issues and Optimization
4. Other References
1. What is malloc
Before implementing malloc, it’s essential to define it formally. According to the C standard library, the prototype of malloc is:
void* malloc(size_t size);
The function should allocate a contiguous block of memory in the system. The allocated memory must be at least the size specified by the 'size' parameter. The return value is a pointer to the start of the allocated memory. The addresses returned by multiple calls to malloc must not overlap unless the previously allocated memory is freed. Malloc should complete the allocation quickly without using NP-hard algorithms. Additionally, both realloc and free functions must be implemented alongside malloc.
2. Preliminary Knowledge
Before diving into the implementation of malloc, it's crucial to understand some Linux memory-related concepts.
2.1 Linux Memory Management
2.1.1 Virtual Memory Address and Physical Memory Address
Modern operating systems use virtual memory addressing. Each process appears to have a large address space, such as 2^64 bytes on a 64-bit system. This allows programs to operate as if they have access to more memory than is physically available. The MMU (Memory Management Unit) translates virtual addresses to physical ones during execution.
2.1.2 Page and Address Composition
In modern operating systems, memory is managed in fixed-size blocks called pages. A typical page size is 4096 bytes (4K). A memory address can be divided into a page number and an offset within the page. This structure simplifies memory management and improves performance.
2.1.3 Memory Pages and Disk Pages
Memory acts as a cache for disk data. When a program accesses a page not currently in physical memory, a page fault occurs, and the system loads the corresponding disk page into memory. This process is handled transparently by the OS and doesn't need to be considered in the implementation of malloc.
2.2 Linux Process Level Memory Management
2.2.1 Memory Layout
Understanding how memory is arranged in a process is essential. On a 64-bit Linux system, the user space is typically located in the lower part of the address space, while the kernel uses the upper part. The heap grows from low to high addresses, and the break pointer controls the end of the heap.
2.2.2 Heap Memory Model
The heap is where most memory allocations occur. The break pointer marks the end of the heap, and memory can be expanded by moving the break pointer forward using the brk or sbrk system calls.
2.2.3 brk and sbrk
The brk and sbrk system calls are used to adjust the break pointer. Brk sets the break directly, while sbrk increments it by a given amount. These calls allow the process to request more memory from the OS.
2.2.4 Resource Limits and rlimit
Each process has resource limits, such as the maximum amount of memory it can use. These limits can be retrieved and modified using the getrlimit and setrlimit system calls.
3. Implementing Malloc
3.1 Toy Implementation
A simple toy implementation of malloc can be created using the sbrk system call. However, this approach lacks tracking of allocated memory and cannot efficiently release it.
3.2 Formal Implementation
To implement a more robust version of malloc, we need to manage memory blocks using a linked list. Each block contains metadata about its size, whether it is free, and a pointer to the next block.
3.2.1 Data Structure
We define a block structure with fields for size, next pointer, free status, padding, and a magic pointer. This structure helps track allocated and free blocks efficiently.
3.2.2 Finding the Right Block
The algorithm for finding a suitable block involves searching through the linked list for a block that is large enough and free. A first-fit strategy is used here for efficiency.
3.2.3 Allocating New Blocks
If no suitable block is found, new memory is allocated using sbrk. This extends the heap and creates a new block at the end of the list.
3.2.4 Splitting Blocks
When a block is too large, it can be split into two smaller blocks. This reduces fragmentation and improves memory usage.
3.2.5 Malloc Implementation
The full implementation of malloc includes aligning the requested size, finding a suitable block, splitting if necessary, and returning the allocated memory.
3.2.6 calloc Implementation
calloc allocates memory and initializes it to zero. This is done by calling malloc and then filling the allocated memory with zeros.
3.2.7 free Implementation
Freeing memory involves marking a block as free and merging adjacent free blocks to reduce fragmentation. This requires checking the validity of the input address and ensuring proper memory management.
3.2.8 realloc Implementation
Realloc adjusts the size of an allocated block. It may involve copying data, splitting, or merging blocks to meet the new size requirement.
3.3 Legacy Issues and Optimization
There are several areas for improvement in the current implementation, such as supporting 32-bit and 64-bit systems, using mmap for large allocations, and optimizing memory management strategies.
4. Other References
This article draws heavily from "A Malloc Tutorial" and other resources. For further reading, "Computer Systems: A Programmer's Perspective" and various Linux kernel documentation provide valuable insights into memory management and system-level programming.
LED Screen,Led Video Wall Panel,Led Video Panels,Video Wall Panels
Shanghai Really Technology Co.,Ltd , https://www.really-led.com