Pangram verdict · v3.3
We believe that this document is fully human-written
AI likelihood · overall
HumanArticle text · 1,661 words · 5 segments analyzed
Over the past year, I’ve been working on fixing C by giving it a high quality, ultra portable standard library. It is not a simple wrapper on top of libc; it doesn’t depend on libc except when required to by the platform. To my knowledge, there is nothing like it.The library is called sp.h1. It’s a 15,000 line, single header library written in plain C99. You can find the source code on GitHub, which includes the library itself, lots of example programs, and half a dozen baseball libraries2 which extend the core. If you prefer to read a few examples and look through the source, head to GitHub first. Otherwise, let’s get on with the pitch!Table of ContentsTable of ContentsPrinciplesProgram directly against syscallsLibc is actively harmfulThere Is No HeapNull-terminated strings are the devil’s workBe a part of your software, not aside from itBe extremely portableBe explicitNon-goalsConformance to existing interfacesObscure architectures and OSesPerformanceA parting thoughtC is valuable because it’s simpleI want to work with youPrinciplesProgram directly against syscallsThe fundamental idea is that any C standard library must be written directly against the lowest level primitives available3. It is neither useful nor productive to try to emulate, produce, or interface with the decades of cruft that have accumulated between the OS and the code that you yourself write.Libc is actively harmfulIt is tempting to conform to libc, because swaths of code promise to compile and run if you can simply provide an implementation of libc. But more and more, this is untrue.Libc does not provide a useful interface for any program. Simple programs would rather use a high level language. Sophisticated programs cannot be written with the primitives that it provides. This has been exacerbated over the past decade as asynchronous programming has become more important. A “fast” program is becoming less about solving e.g. register allocation better than the other compiler and more about e.g. using the right kernel primitives to do IO.Any interface upon which the fundamental unit of IO is FILE* or upon which a substring is a malformed idea is not just annoying.
It’s harmful. sp.h casts it aside4.There Is No HeapThese types underpin the entire library:typedef enum { SP_ALLOCATOR_MODE_ALLOC, SP_ALLOCATOR_MODE_FREE, SP_ALLOCATOR_MODE_RESIZE, } sp_mem_alloc_mode_t;
SP_TYPEDEF_FN( void*, sp_allocator_fn_t, void* user_data, sp_mem_alloc_mode_t mode, u64 size, void* ptr );
typedef struct sp_allocator_t { sp_allocator_fn_t on_alloc; void* user_data; } sp_mem_t; In other words, allocators. They do so by forcing programs to accept that “the ability to allocate any amount of memory from the ether” is not a primitive; it is a fiction. The operating system hands out pages. The runtime on top of it, most often called via malloc(), is what implements the often useful fiction that non-page-sized amounts of memory can be allocated.Memory is not owned by “the runtime” or “the heap”. Memory is owned by your program. If malloc()-shaped heap allocations are what your program wants, then that’s great! There’s nothing wrong with that. But in my experience, that is an unfortunate default rather than something that is true, and this library seeks to make it opt-in rather than opt-out.Null-terminated strings are the devil’s workI have written about this in the pastNull terminated strings mean you cannot:Return a non-owning substringKnow the length of a string in O(1)Write lexers and parsers which return ergonomic views into sourceBuild strings without invalid intermediate valuesPlus, of course, the unfathomable number of bugs and security issues that arise from a missing null terminator. Step one to modernizing C is to completely ditch null terminated strings in favor of the humble sp_str_t.The only downside, I believed, was that you were forced to make an extra copy to interface with any other C API you might come across. I have come to find that this is completely meaningless.A C standard library built natively around pointer + length strings is shockingly ergonomic.
For example, a snippet from a wc clone: sp_str_t content = sp_zero; sp_io_read_file(mem, path, &content);
sp_ht(sp_str_t, u32) counts = sp_zero; sp_str_ht_init(mem, counts); sp_da(sp_str_t) lines = sp_str_split_c8(mem, content, '\n'); sp_da_for(lines, i) { sp_da(sp_str_t) words = sp_str_split_c8(mem, lines[i], ' ');
sp_da_for(words, j) { u32* count = sp_str_ht_get(counts, words[j]); if (count) { *count = *count + 1; } else { sp_str_ht_insert(counts, words[j], 1); } } } If your first reaction is “so what?”, then, yeah, that’s the point. Here’s a piece of C code which reads roughly like any high level language but also never copies data from the source buffer while parsing. In other words, it’s both the most ergonomic version and the most performant version.Be a part of your software, not aside from itThe library is meant to be read, modified, tweaked, rewritten, or whatever verb you might need to have it serve your purposes. I’ve worked very hard to this end:The core of the library is ~40 syscalls which are the only platform specific code5The library ships as a single file which needs no configurationThe file is extremely organized, and tagged with @tags for human or LLM searchEvery function is part of a namespaceWhere the frustrating parts of C seek to hide the OS your program runs on behind an elaborate fiction, sp.h seeks to unify only those things which are true, as thinly as possible while being useful, and then building functionality on top of the exact same primitives that it gives you.Be extremely portablesp.h is written in C99, and it compiles against any compiler and libc imaginable. It works on Linux, on Windows, on macOS. It works under a WASM host. It works in the browser. It works with MSVC, and MinGW, it works with or without libc, or with weird ones like Cosmopolitan.
It works with the big compilers and it works with TCC.And, best of all, it does all all of that because it’s small, not because it’s big.Be explicitEvery time I’ve picked implicit over explicit, I’ve come to regret it and paid the price to fix it:Errors are always returned and handled by the callerPrograms do not have mutable global stateFunctions which allocate take an allocatorMemory is zero initializedNon-goalsConformance to existing interfacesThis is not libc. When required to, sp.h will respect libc, and it will always work unobtrusively and completely when embedded in a libc-using program. But it is not libc, and you should not expect it to act like it is.Obscure architectures and OSesI write code for x86_64 and aarch64. WASM is becoming more important, but is still secondary to native targets. I don’t care to bloat the library to support a tiny fraction of use cases.That being said, if you’re interested in using the library on an unsupported platform, I’m more than happy to help, and if we can make the patch reasonable, to merge it.PerformanceThe library’s stance, to put it simply, that the juice ain’t worth the squeeze when it comes to low level, compute-bound performance.Designing software and data structures for performance against unknown use cases on unknown hardware is extremely difficult and the resulting code is much more complicated. Even then, it’s often better to use code written against your actual use case and hardware when performance is that critical.Things that are off the table might be:SIMDA highly optimized hash table rewriteFiguring out where inlining or LIKELY causes the compiler to produce better code.Things that are on the table might be:Providing the correct abstractions to do optimized and/or zero copy IOWriting APIs that do not require copying dataOf course, doing fine-grained optimization where it’s hurting people is always on the table. Fixing bugs is always on the table. I am not anti optimization; just busy.A parting thoughtThe natural question one might have is: Why are you doing this? There have never been more or better languages for systems programming. Why not just use one?The answer is that C holds a real niche, and not wholly built on legacy.
To my knowledge, it’s the only language which:Can be directly compiled to any machine code imaginableHas an ecosystem of state-of-the-art optimizing compilersIs written in the same language as the OS and most librariesYou could write a reasonable compiler for as a personal projectIn other words…C is valuable because it’s simpleOf course, these are all unfair to varying degrees. LLVM exists, so technically everyone has a SOTA compiler. Most languages have FFIs and tooling. The best systems languages are better at C than C is.And yet, to have something so well-supported, so optimized, so tied to the platforms upon which we write native code, and so approachable is magical.I want to work with youI would like nothing more than to make friends and/or help you work on this library, stranger. I’ll help you port it to your weird environment. I’ll explain any of it to you. I’ll listen politely while you tell me I’m terrible at programming. I am certainly no genius at systems programming; everything I have is the product of really bad misunderstandings about how software and computers work, followed by lots of hard work and fun and more software.I’m on a Discord server or you can find me at #sp on IRC. You can also email me. The domain’s the same as this site, and the handle is my last name6.The first two letters of my last name. A little vanity never hurt, right? ↩They add to a single header library, so they’re double header libraries. Doubleheader. ↩Where “syscall” means “the lowest level primitive available”. On Linux, it’s always actual syscalls. On Windows, that’s usually NT. On macOS, it’s usually the syscall-wrapper subset of libc because you’re forced to link libc and it’s not quite as open as Linux (although there is a rich “undocumented” set of APIs and syscalls that are very interesting). ↩There are some places where the library is still more POSIX-shaped than it ought to be in its lowest levels. But, hey, that’s what an alpha’s for, right? ↩This is probably 85% true right now. There are a few stragglers; mostly things that I haven’t had time to properly design as the absolute minimum set of primitives and which therefore live outside the core.