Transparent and Cooperative Merging and Splitting of Huge Pages

image

A huge page composed of many smaller pages. [Generated with AI]

Context

The rapid growth of main memory capacities poses significant challenges for both hardware engineers and software developers. In particular, efficiently provisioning large virtual address spaces requires substantial effort. To extend address space sizes, additional levels of page tables are necessary, which increase address translation overhead. To mitigate this issue, aggressive prefetching and complex cache hierarchies are employed, trying to hide the added latency.

One strategy to alleviate pressure on the memory management unit is the introduction of huge and giant pages. These larger pages increase the granularity of the virtual-to-physical address mapping from 4 KiB to 2 MiB or 1 GiB, effectively omitting lower-level page tables. As a result, fewer steps are required to traverse the page tables and translation lookaside buffer (TLB) coverage increases, leading to fewer TLB misses. Although this approach is effective from a hardware perspective, it demands sophisticated software support to realize its benefits. Using multiple page sizes simultaneously in a system requires robust fragmentation avoidance mechanisms; however, this can be challenging. Moreover, the computational overhead of active memory compaction can easily offset the advantages of this approach.

In contrast, Morsels address the memory challenge from an orthogonal perspective. Rather than focusing on address translation overhead, we initially targeted the substantial bookkeeping overhead associated with managing vast amounts of memory as numerous small 4 KiB pages. Technically, Morsels represent a subtree of the page table hierarchy, managed as a single, indivisible object, enabling fast transfer between address spaces.

Problem

The Morsel concept has shown promising results in terms of memory management, but it still has some room for improvement regarding translation overhead. An initial implementation introduced huge and giant page support to Morsels; however, this was done in an all-or-nothing manner.

While larger page sizes can reduce address translation overhead and enhance application performance, the actual benefits depend on the access pattern and data layout. However, for sparsely used memory ranges, internal fragmentation and resulting memory waste can outweigh potential performance gains. Therefore, it is essential to use huge pages only where they provide advantages, requiring a finer-grained decision-making process that cannot be made at the object level.

A follow-up thesis moved the implementation from a global to a range-based page-size selection. The desired page size is encoded into non-present page table entries ahead of use, allowing for range-specific definition. A page of the configured size is allocated on demand (e.g., during a page fault).

Although this approach allows for fine-grained specification of the desired page size, it still does not allow for adaptation to evolving access characteristics. Therefore, dynamic reconfiguration is desirable. While splitting a huge page is trivial, merging smaller pages into a bigger one transparently requires careful synchronization. To decrease the penalty for applications accessing a page "under merge", the colliding thread should actively contribute its computing power to speed up the operation. To achieve an actual benefit, lightweight synchronization is required here.

The same approach could also be applied to the copy-on-write implementation to speed up the costly resolution of shared huge pages on a write fault.

Goal

The primary objective of this thesis is to implement and evaluate the dynamic reconfiguration of page sizes. To achieve this, the existing mechanism should be extended to not only encode the desired page size for later allocation but also split/merge pages if a range is already backed by memory.

To ensure seamless integration, this operation must work completely transparent, allowing other threads to concurrently access the Morsel object without interruption. If a concurrent thread hits a location that is currently under merge, it should participate in the operation to speed it up.

The thesis could follow these key steps:

  1. Getting started: Familiarize with kernel development, set up a suitable development environment, and establish a functional test setup.
  2. Splitting: Implement page splitting by setting up a new page table and then atomically replacing the old page table entry with a new one that references the new table instead of the huge page directly.
  3. Merging: Also implement page merging; concurrent accesses must be delayed.
  4. Cooperative Merging: Concurrent accesses will participate in speeding up the operation.
  5. Evaluation: Evaluate the implementation with microbenchmarks targeting the new functionality in different settings and application-level benchmarks that highlight the benefits of the mixed page sizes.
  6. Optional Extension 1: Extend the system call to automatically select the optimal page size based on sparsity and memory usage.
  7. Optional Extension 2: Bring the cooperation mechanism to copy-on-write to also speed up the resolution of shared huge pages.

Topics: C, Linux kernel, huge pages, virtual memory, paging

References

Publications

DIMES Workshop
Morsels: Explicit Virtual Memory Objects
Alexander Halbuer, Christian Dietrich, Florian Rommel, Daniel LohmannProceedings of the 1st Workshop on Disruptive Memory SystemsAssociation for Computing Machinery2023.
PDF Details Slides 10.1145/3609308.3625267 [BibTex]

Transparent Huge Pages for Virtual-Memory Objects

Morsels are currently limited to a single, unified page size per object (4 KiB, 2 MiB, or 1 GiB), which must be defined at creation time. To better accommodate real-world application needs, a more flexible implementation that supports variable page sizes is required.

 
Typ
Bachelorarbeit

 
Status
abgeschlossen

 
Supervisors
Alexander Halbuer
Daniel Lohmann

 
Project
ParPerOS

 
Bearbeiter
Marvin Steiner (abgegeben: 29. Dec 2025)

Huge Pages for Virtual-Memory Objects

Currently, Morsels only support 4-KiB pages, but with larger page sizes the management overhead could be further reduced and the average memory access could be sped up, due to faster page-table walks and an increased TLB coverage.

 
Typ
Bachelorarbeit

 
Status
abgeschlossen

 
Supervisors
Alexander Halbuer
Daniel Lohmann

 
Project
ParPerOS

 
Bearbeiter
Marko Bolowski (abgegeben: 18. Mar 2024)