Transparent and Cooperative Merging and Splitting of Huge Pages
- Typ der Arbeit: Bachelorarbeit
- Status der Arbeit: reserviert
- Projekte: ParPerOS
- Betreuer: Alexander Halbuer, Daniel Lohmann

A huge page composed of many smaller pages. [Generated with AI]
Context
The rapid growth of main memory capacities poses significant challenges for both hardware engineers and software developers. In particular, efficiently provisioning large virtual address spaces requires substantial effort. To extend address space sizes, additional levels of page tables are necessary, which increase address translation overhead. To mitigate this issue, aggressive prefetching and complex cache hierarchies are employed, trying to hide the added latency.
One strategy to alleviate pressure on the memory management unit is the introduction of huge and giant pages. These larger pages increase the granularity of the virtual-to-physical address mapping from 4 KiB to 2 MiB or 1 GiB, effectively omitting lower-level page tables. As a result, fewer steps are required to traverse the page tables and translation lookaside buffer (TLB) coverage increases, leading to fewer TLB misses. Although this approach is effective from a hardware perspective, it demands sophisticated software support to realize its benefits. Using multiple page sizes simultaneously in a system requires robust fragmentation avoidance mechanisms; however, this can be challenging. Moreover, the computational overhead of active memory compaction can easily offset the advantages of this approach.
In contrast, Morsels address the memory challenge from an orthogonal perspective. Rather than focusing on address translation overhead, we initially targeted the substantial bookkeeping overhead associated with managing vast amounts of memory as numerous small 4 KiB pages. Technically, Morsels represent a subtree of the page table hierarchy, managed as a single, indivisible object, enabling fast transfer between address spaces.
Problem
The Morsel concept has shown promising results in terms of memory management, but it still has some room for improvement regarding translation overhead. An initial implementation introduced huge and giant page support to Morsels; however, this was done in an all-or-nothing manner.
While larger page sizes can reduce address translation overhead and enhance application performance, the actual benefits depend on the access pattern and data layout. However, for sparsely used memory ranges, internal fragmentation and resulting memory waste can outweigh potential performance gains. Therefore, it is essential to use huge pages only where they provide advantages, requiring a finer-grained decision-making process that cannot be made at the object level.
A follow-up thesis moved the implementation from a global to a range-based page-size selection. The desired page size is encoded into non-present page table entries ahead of use, allowing for range-specific definition. A page of the configured size is allocated on demand (e.g., during a page fault).
Although this approach allows for fine-grained specification of the desired page size, it still does not allow for adaptation to evolving access characteristics. Therefore, dynamic reconfiguration is desirable. While splitting a huge page is trivial, merging smaller pages into a bigger one transparently requires careful synchronization. To decrease the penalty for applications accessing a page "under merge", the colliding thread should actively contribute its computing power to speed up the operation. To achieve an actual benefit, lightweight synchronization is required here.
The same approach could also be applied to the copy-on-write implementation to speed up the costly resolution of shared huge pages on a write fault.
Goal
The primary objective of this thesis is to implement and evaluate the dynamic reconfiguration of page sizes. To achieve this, the existing mechanism should be extended to not only encode the desired page size for later allocation but also split/merge pages if a range is already backed by memory.
To ensure seamless integration, this operation must work completely transparent, allowing other threads to concurrently access the Morsel object without interruption. If a concurrent thread hits a location that is currently under merge, it should participate in the operation to speed it up.
The thesis could follow these key steps:
- Getting started: Familiarize with kernel development, set up a suitable development environment, and establish a functional test setup.
- Splitting: Implement page splitting by setting up a new page table and then atomically replacing the old page table entry with a new one that references the new table instead of the huge page directly.
- Merging: Also implement page merging; concurrent accesses must be delayed.
- Cooperative Merging: Concurrent accesses will participate in speeding up the operation.
- Evaluation: Evaluate the implementation with microbenchmarks targeting the new functionality in different settings and application-level benchmarks that highlight the benefits of the mixed page sizes.
- Optional Extension 1: Extend the system call to automatically select the optimal page size based on sparsity and memory usage.
- Optional Extension 2: Bring the cooperation mechanism to copy-on-write to also speed up the resolution of shared huge pages.
Topics: C, Linux kernel, huge pages, virtual memory, paging
References
Web Links
Publications
-
DIMES
Workshop
Morsels: Explicit Virtual Memory Objects -
Proceedings of the 1st Workshop on Disruptive Memory SystemsAssociation for Computing Machinery2023.
PDF Details Slides 10.1145/3609308.3625267 [BibTex]
Related Theses
Transparent Huge Pages for Virtual-Memory Objects
- Typ
- Bachelorarbeit
- Status
- abgeschlossen
- Supervisors
- Alexander Halbuer
Daniel Lohmann - Project
- ParPerOS
- Bearbeiter
- Marvin Steiner (abgegeben: 29. Dec 2025)
Huge Pages for Virtual-Memory Objects
- Typ
- Bachelorarbeit
- Status
- abgeschlossen
- Supervisors
- Alexander Halbuer
Daniel Lohmann - Project
- ParPerOS
- Bearbeiter
- Marko Bolowski (abgegeben: 18. Mar 2024)
