Pangram verdict · v3.3
We believe that this document is fully human-written
AI likelihood · overall
HumanArticle text · 1,722 words · 6 segments analyzed
Many people, including myself, have implemented garbage collection (GC) libraries for Rust. Manish Goregaokar wrote up a fantastic survey of this space a few years ago. These libraries aim to provide a safe API for their users to consume: an unsafe-free interface which soundly encapsulates and hides the library’s internal unsafe code. The one exception is their mechanism to enumerate the outgoing GC edges of user-defined GC types, since failure to enumerate all edges can lead the collector to believe that an object is unreachable and collect it, despite the fact that the user still has a reference to the reclaimed object, leading to use-after-free bugs.1 This functionality is generally exposed as an unsafe trait for the user to implement because it is the user’s responsibility, not the library’s, to uphold this particular critical safety invariant.However, despite providing safe interfaces, all of these libraries make extensive use of unsafe code in their internal implementations. I’ve always believed it was possible to write a garbage collection library without any unsafe code, and no one I’ve asserted this to has disagreed, but there has never been a proof by construction.So, finally, I created the safe-gc crate: a garbage collection library for Rust with zero unsafe code. No unsafe in the API. No unsafe in the implementation. It even has a forbid(unsafe_code) pragma at the top.That said, safe-gc is not a particularly high-performance garbage collector.Using safe-gcTo use safe-gc, first we define our GC-managed types, using Gc<T> to define references to other GC-managed objects, and implement the Trace trait to report each of those GC edges to the collector:use safe_gc::{Collector, Gc, Trace};
// Define a GC-managed object. struct List { value: u32,
// GC-managed references to the next and previous links in the list. prev: Option<Gc<List>>, next: Option<Gc<List>>, }
// Report GC edges to the collector.
impl Trace for List { fn trace(&self, collector: &mut Collector) { if let Some(prev) = self.prev { collector.edge(prev); } if let Some(next) = self.next { collector.edge(next); } } }This looks pretty similar to other GC libraries in Rust, although it could definitely benefit from an implementation of Trace for Option<T> and a derive(Trace) macro. The big difference from existing GC libraries is that Trace is safe to implement; more on this later.Next, we create one or more Heaps to allocate our objects within. Each heap is independently garbage collected.use safe_gc::Heap;
let mut heap = Heap::new();And with a Heap in hand, we can allocate objects:let a = heap.alloc(List { value: 42, prev: None, next: None, });
let b = heap.alloc(List { value: 36, prev: Some(a.into()), next: None, });
// Create a bunch of garbage! Who cares! It'll all be cleaned // up eventually! for i in 0..100 { let _ = heap.alloc(List { value: i, prev: None, next: None, }); }The heap will automatically trigger garbage collections, as necessary, but we can also force a collection if we want:// Force a garbage collection! heap.gc()Rather than deref’ing Gc<T> pointers directly, we must index into the Heap to access the referenced T object. This contrasts with other GC libraries and is the key that unlocks safe-gc’s lack of unsafe code, allowing the implementation to abide by Rust’s ownership and borrowing discipline.2// Read from a GC object in the heap. let b_value = heap[&b].value; assert_eq!(b_value, 36);
// Write to a GC object in the heap. heap[&b].value += 1; assert_eq!(heap[&b].value, 37);Finally, there are actually two types for indexing into Heaps to access GC objects: Gc<T>, which we have seen already, and Root<T>, which we have also seen in action, but which was hidden from us by type inference.
The Gc<T> type is Copy and should be used when referencing other GC-managed objects from within a GC-managed object’s type definition, or when you can prove that a garbage collection will not happen (i.e. you have a shared borrow of its heap). A Gc<T> does not root its referenced T, keeping it alive across garbage collections, and therefore Gc<T> should not be used to hold onto GC references across any operation that can trigger a garbage collection.A Root<T>, on the other hand, does indeed root its associated T object, preventing the object from being reclaimed during garbage collection. This makes Root<T> suitable for holding references to GC-managed objects across operations that can trigger garbage collections. Root<T> is not Copy because dropping it must remove its entry from the heap’s root set. Allocation returns rooted references; all the heap.alloc(...) calls from our earlier examples returned Root<T>s.Peeking Under the HoodA safe_gc::Heap is more similar to an arena newtype over a Vec than an engineered heap with hierarchies of regions like Immix. Its main storage is a hash map from std::any::TypeId to uniform arenas of the associated type. This lets us ultimately use Vec as the storage for heap-allocated objects, and we don’t need to do any unsafe pointer arithmetic or worry about splitting large blocks in our free lists. In fact, the free lists only manage indices, not blocks of raw memory.pub struct Heap { // A map from `type_id(T)` to `Arena<T>`. The `ArenaObject` // trait facilitates crossing the boundary from an untyped // heap to typed arenas. arenas: HashMap<TypeId, Box<dyn ArenaObject>>,
// ... }
struct Arena<T> { elements: FreeList<T>,
// ... }
enum FreeListEntry<T> { /// An occupied entry holding a `T`. Occupied(T),
/// A free entry that is also part of a linked list /// pointing to the next free entry, if any. Free(Option<u32>), }
struct FreeList<T> { // The actual backing storage for our `T`s.
entries: Vec<FreeListEntry<T>>,
/// The index of the first free entry in the free list. free: Option<u32>,
// ... }To allocate a new T in the heap, we first get the T object arena out of the heap’s hash map, or create it if it doesn’t exist yet. Then, we check if the arena has capacity to allocate our new T. If it does, we push the object onto the arena and return a rooted reference. If it does not, we fall back to an out-of-line slow path where we trigger a garbage collection to ensure that we have space for the new object, and then try again.impl Heap { #[inline] pub fn alloc<T>(&mut self, value: T) -> Root<T> where T: Trace, { let heap_id = self.id; let arena = self.ensure_arena::<T>(); // Fast path for when we have available capacity for // allocating into. match arena.try_alloc(heap_id, value) { Ok(root) => root, Err(value) => self.alloc_slow(value), } }
// Out-of-line slow path for when we need to GC to free // up or allocate additional space. #[inline(never)] fn alloc_slow<T>(&mut self, value: T) -> Root<T> where T: Trace, { self.gc(); let heap_id = self.id; let arena = self.ensure_arena::<T>(); arena.alloc_slow(heap_id, value) } }Arena<T> allocation bottoms out in allocating from a FreeList<T>, which will attempt to use existing capacity by popping off its internal list of empty entries when possible, or otherwise fall back to reserving additional capacity.impl<T> FreeList<T> { fn try_alloc(&mut self, value: T) -> Result<u32, T> { if let Some(index) = self.free { // We have capacity. Pop the first free entry off // the free list and put the value in there.
let index = usize::try_from(index).unwrap(); let next_free = match self.entries[index] { Entry::Free(next_free) => next_free, Entry::Occupied { .. } => unreachable!(), }; self.free = next_free; self.entries[index] = Entry::Occupied(value); Ok(index) } else { // No capacity to hold the value; give it back. Err(value) } }
fn alloc(&mut self, value: T) -> u32 { self.try_alloc(value).unwrap_or_else(|value| { // Reserve additional capacity, since we didn't have // space for the allocation. self.double_capacity(); // After which the allocation will succeed. self.try_alloc(value).ok().unwrap() }) } }Accessing objects in the heap is straightforward: look up the arena for T and index into it.impl Heap { /// Get a shared borrow of the referenced `T`. pub fn get<T>(&self, gc: impl Into<Gc<T>>) -> &T where T: Trace, { let gc = gc.into(); assert_eq!(self.id, gc.heap_id); let arena = self.arena::<T>().unwrap(); arena.elements.get(gc.index) }
// Get an exclusive borrow of the referenced `T`. pub fn get_mut<T>(&mut self, gc: impl Into<Gc<T>>) -> &mut T where T: Trace, { let gc = gc.into(); assert_eq!(self.id, gc.heap_id); let arena = self.arena_mut::<T>().unwrap(); arena.elements.get_mut(gc.index) } }Before we get into how safe-gc actually performs garbage collection, we need to look at how it implements the root set. The root set are the set of things that are definitely alive; things that the application is actively using right now or planning to use in the future. The goal of the collector is to identify all objects transitively referenced by these roots, since these are the objects that can still be used in the future, and recycle all others.Each Arena<T> has its own RootSet<T>.
For simplicity a RootSet<T> is a wrapper around a FreeList<Gc<T>>. When we add new roots, we insert them into the FreeList, and when we drop a root, we remove it from the FreeList. This does mean that the root set can contain duplicates and is therefore not a proper set. The root set’s FreeList is additionally wrapped in an Rc<RefCell<...>> so that we can implement Clone for Root<T>, which adds another entry in the root set, and don’t need to explicitly pass around a Heap to hold additional references to a rooted object.Finally, I took care to design Root<T> and RootSet<T> such that Root<T> doesn’t directly hold a Gc<T>. This allows for updating rooted GC pointers after a collection, which is necessary for moving GC algorithms like generational GC and compaction. In fact, I originally intended to implement a copying collector, which is a moving GC algorithm, for safe-gc but ran into some issues. More on those later. For now, we retain the possibility of introducing moving GC at a later date.struct Arena<T> { // ...
// Each arena has a root set. roots: RootSet<T>, }
// The set of rooted `T`s in an arena. struct RootSet<T> { inner: Rc<RefCell<FreeList<Gc<T>>>>, }
impl<T: Trace> RootSet<T> { // Rooting a `Gc<T>` adds an entry to the root set. fn insert(&self, gc: Gc<T>) -> Root<T> { let mut inner = self.inner.borrow_mut(); let index = inner.alloc(gc); Root { roots: self.clone(), index, } }
fn remove(&self, index: u32) { let mut inner = self.inner.borrow_mut(); inner.dealloc(index); } }
pub struct Root<T: Trace> { // Each `Root<T>` holds a reference to the root set. roots: RootSet<T>,
// Index of this root in the root set. index: u32, }
// Dropping a `Root<T>` removes its entry from the root set.