The lone lisp heap

M matheusmoreira.com ↗

▲ 94 points • 30 comments • by stevekemp • 4w ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

0 %

AI likelihood · overall

Human

100% human-written 0% AI-generated

SEGMENTS · HUMAN 5 of 5

SEGMENTS · AI 0 of 5

WORD COUNT 1,450

PEAK AI % 0% · §1

Analyzed

May 28

backend: pangram/v3.3

Segments scanned

5 windows

avg 290 words each

Distribution

100 / 0%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,450 words · 5 segments analyzed

Human AI-generated

§1 Human · 0%

Like many dynamic languages out there, lone started out simple. It was essentially a collection of data structures written in C, an union comprising all of those types, a typed value structure containing the union plus metadata, and a custom language whose entire purpose is to bring all these values together into patterns that resemble working programs. struct lone_lisp_list { /* ... */ }; struct lone_lisp_vector { /* ... */ }; struct lone_lisp_table { /* ... */ }; /* ... */

enum lone_lisp_heap_value_type { LONE_LISP_TYPE_LIST, LONE_LISP_TYPE_VECTOR, LONE_LISP_TYPE_TABLE, /* ... */ };

struct lone_lisp_heap_value { bool live: 1; bool marked: 1; /* ... */

enum lone_lisp_heap_value_type type;

union { struct lone_lisp_list list; struct lone_lisp_vector vector; struct lone_lisp_table table; /* ... */ } as; };

When I put it this way, the language itself seems almost incidental to the virtual machine work. That's because it is. Lone was in fact not designed ahead of time. It was and still is being constructed in real time. It's almost as if the language itself is arising as a consequence of its own implementation. It grows in complexity alongside the knowledge and understanding of its creator. At first I was a neophyte. I was attracted to ideas that were almost painfully simple, almost too naive to cope with the harsh reality of the world. The code reflected that. Take the above structures, for example. How are such values created? My first instinct is to simply allocate some memory for each object. Call malloc with the size of the structure, then initialize the memory it returns. It's that easy. Right? Memory allocation Lone is a lisp interpreter written in freestanding C. There is no dynamic memory allocation in freestanding C. There is no such thing as malloc. There is no libc. There is only me and the code.

§2 Human · 0%

If I wanted malloc, I would have to write it myself. So I wrote it. I read up on everything I could find online about memory and its allocation. Then I made my own memory allocator. To manage a memory block, the allocator needs to describe it. Let's start with the basics. The most basic information about a piece of data is its location and its size. struct lone_memory { size_t size; unsigned char pointer[]; };

The allocator must also keep track of whether the block is free or currently in use. Otherwise, there would be chaos. struct lone_memory { bool free; size_t size; unsigned char pointer[]; };

This is enough to manage any one block of memory. If it's free, then the allocator can give the block to whoever asks. If it's not free, then it can't. If they ask for less or exactly as much memory as this block contains, then it can be given out. If they ask for more, then it can't. I started from literally nothing. Now I've got the one. Time to handle infinity. struct lone_memory { struct lone_memory *prev, *next; bool free; size_t size; unsigned char pointer[]; };

Simply link the blocks to each other. Now if some code requests a block, the allocator can walk through the list of all blocks and search for any block that fits. And that's exactly what the allocator does. for (block = system->memory; block; block = block->next) if (block->free && block->size >= size) break;

if (!block) { return 0; }

block->free = false; return block->pointer;

Searches the list of blocks and literally returns the first block that fits. This thing could end up returning 16 KiB blocks for 64 byte requests. Crude, but effective. Incredibly wasteful, of course.

§3 Human · 0%

Potentially more wasteful than mmaping pages for every single allocation. The memory allocator can do better than this. Why not cut the block up into smaller chunks? block->free = false; lone_memory_split(block, size); return block->pointer;

Split the block into two blocks: an allocated block of exactly the requested size, and a new free block for any excess memory that might remain. size_t excess = block->size - size;

/* create a new block only if there's enough space for memory block descriptor + 1 byte */

if (excess >= sizeof(struct lone_memory) + 1) { new = (struct lone_memory *) (block->pointer + size);

/* weave the new block into the linked lists */

new->free = true; new->size = excess - sizeof(struct lone_memory); block->size = size; }

The excess memory block gets conjured up out of thin air: the allocator simply drops a new memory block descriptor right after the end of the previous memory block. When the links are established, the excess memory can be allocated like any other memory block. Memory deallocation is even simpler: just mark the block as free. The block descriptor is just behind the pointer, trivially reachable. struct lone_memory *block = ((struct lone_memory *) pointer) - 1; block->free = true;

This is enough, but the allocator can do better. It can check if surrounding blocks are also free, and undo the split if that is the case. lone_memory_coalesce(block); lone_memory_coalesce(block->prev);

Coalescing a memory block is simple. When two adjacent blocks are free, one block literally absorbs the other, block descriptor and all.

§4 Human · 0%

if (block && block->free) { next = block->next; if (next && next->free) { block->size += next->size + sizeof(struct lone_memory); /* unlink the blocks */ } }

That's it. A simple yet complete first fit, split/coalesce memory allocator. Give it a large block of memory to manage and watch as it cuts the block up into small chunks and hands them all out to anyone who asks. #define LONE_MEMORY_SIZE (64 * 1024) static unsigned char memory[LONE_MEMORY_SIZE];

lone_lisp_initialize(&lone, memory, sizeof(memory));

How good is this memory allocator? Let's just say that terrible doesn't quite do it justice. This thing will linearly scan the list of blocks for every single allocation. It will fail pretty hard at keeping memory fragmentation under control, which not only wastes memory but could result in entirely avoidable out-of-memory situations. Small leftover blocks tend to accumulate near the start of the list, further compounding the problems as it runs. Prefixing block descriptors to every block means metadata overhead of around 30% to 50% for typical allocations, not to mention the fact those headers are classic targets for exploitation. The shortcomings of this allocator could fill up an entire article all by themselves. Nevertheless, lone made do with this thing for around three years. For all of its flaws, it absolutely does allocate memory, and memory was all that I needed to make some objects. I didn't really care about the allocator, I just needed some values I could play with as I developed the language. Yeah, I'll totally go ahead and do that now. Allocate some values, I mean. Everything's gonna be simple now. Right? Scanning for pointers It's not that simple. Other parts of the code have requirements that must be fulfilled. The lone lisp garbage collector is conservative.

§5 Human · 0%

It walks the stack and scrutinizes everything it finds. It wants to know whether they are pointers to lisp objects. In order to do that, it must maintain a list of every single lisp value pointer. Sounds simple when I put it that way. Why not just do it? Just scan the stack and compare every single word against every single lisp pointer, right? Right?! Yeah, no. That's not going to happen. Well, not unless you want to get into the quadratic hall of shame. There's got to be a way, some clever data structure to make the problem go away... The first heap Wait, I think I know what to do. During my tenure in the Ruby mines many eons ago, I just so happened to pick up some tricks. That includes this one: struct lone_lisp_heap { struct lone_lisp_heap *next; struct lone_lisp_heap_value values[LONE_LISP_HEAP_VALUE_COUNT]; };

A linked list of arrays of values! General purpose memory allocators provide no guarantees; those must be forged by hand by way of data structures. Now objects aren't individually allocated any more, the language buys them in bulk. The lone heap is essentially a list of nice little chunks of lisp objects, all lying right next to each other with zero gaps in between them. Now the garbage collector can simply check if pointers fall within range and call it a day. Alright! Mechanically, it's almost identical to the general purpose memory allocator. It just works at the lisp value level instead of working at the byte level. There's a linked list of chunks, it walks through them all. For each chunk, it loops over the values. If it finds a dead value, it just returns it. Chunks are allocated with all values dead. The garbage collector just kills the values in those slots instead of deallocating them. The allocator isn't really allocating values, it's resurrecting them.