Application-Specific Hash-Table Optimization for Scalable Metadata-Management in the Linux Kernel
- Typ der Arbeit: Bachelorarbeit
- Status der Arbeit: laufend
- Projekte: ParPerOS
- Betreuer: Alexander Halbuer, Daniel Lohmann
- Bearbeiter: Linus Wolkersdorfer

The rosebush hashtable: A main array (chain) with huge buckets to hold multiple elements at once. [Generated with AI]
This thesis is a follow-up work of a preceding bachelor thesis. For full motivation and context also look there.
Context
Modern computer systems can house vast amounts of main memory, potentially heterogeneous, and offer advanced memory features such as encryption and error correction. The operating system's memory subsystem must effectively manage these diverse properties while maintaining decent performance, as it frequently resides on the critical path of program execution.
Our previous work, Morsels, introduced a novel memory-management paradigm that shifts from traditional page-based management to larger virtual-memory objects, technically represented as subtrees within the page-table hierarchy. This approach reduces management overhead and enables very fast transfer between address spaces.
Problem
We already analyzed different dynamic data structures to replace the static struct page for reference counting in our Morsel implementation.
Hash tables turned out to be the most suitable fit for this use case as they provide a constant access latency (in the average case) independent of the number stored elements.
In the preceding thesis, the Rosebush hash table showed promising results. The main difference to the established Rhastable is the use of fixed arrays as buckets instead of relying on linked lists for resolving hash collisions. This design choice results in less flexibility (fixed number of slots inside the array) but also reduces the access latency (less random memory accesses). Nevertheless, increment and decrement operations (required by the Copy-on-Write implementation) are at least twice as expensive as with a statically allocated array. Especially memory allocations, which are required to insert a new element, bring down performance.
Goal
There are several possible optimizations to tailer the Rosebush hash table to our specific use case. The most promising idea is to directly store the reference counters in-place inside the buckets instead of having an additional pointer reference. This change would reduce the lookup latency (one indirect memory access less) as well as reducing the memory footprint (one pointer per reference count less). On the other hand, this complicates management routines as the stored data and the data structure are now tightly coupled.
Further approaches could be:
- Asynchronous resizing: Growing currently happens synchronously in the insert path; shrinking is not implemented at all.
- Densely packed data: Hashes are currently 64 bit, but we only need 40 bit; ref counts are 64 bit as well.
- Lock-free/scalable algorithms: The current implementation uses fine-grained locking.
- …
Schedule
- Getting started: Familiarize with kernel development, set up a suitable development environment, and establish a functional test setup.
- Measure the baseline: Measure the key characteristics of the current Rosebush implementation.
- Implementation/Optimization: Apply the optimizations to the data structure.
- Evaluation: Quantify the performance gains and memory savings.
Topics: C, Linux kernel, copy-on-write, hash table, rosebush
References
Web Links
- Rosebush, a new hash table
- The rhashtable documentation I wanted to read
- Rhashtables: under the hood
Related Theses
Dynamic Metadata Management for Virtual-Memory Objects
- Typ
- Bachelorarbeit
- Status
- abgeschlossen
- Supervisors
- Alexander Halbuer
Daniel Lohmann - Project
- ParPerOS
- Bearbeiter
- Alex Alfonso Trigo (abgegeben: 07. Aug 2025)
Efficient Copy-on-Write for Virtual-Memory Objects
- Typ
- Masterarbeit
- Status
- abgeschlossen
- Supervisors
- Alexander Halbuer
Daniel Lohmann - Project
- ParPerOS
- Bearbeiter
- Pasha Fistanto (abgegeben: 02. May 2024)
Optimizing Memory Metadata: Dynamic Allocation of Struct Pages in the Linux Kernel
- Typ
- Masterarbeit
- Status
- abgeschlossen
- Supervisors
- Lars Wrenger
Alexander Halbuer
Daniel Lohmann - Project
- ParPerOS
- Bearbeiter
- Paul Aumann (abgegeben: 14. Jun 2024)
