Borrow-checking without type-checking

S scattered-thoughts.net ↗

▲ 91 points • 32 comments • by jamii • 4w ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

1 %

AI likelihood · overall

Human

100% human-written 0% AI-generated

SEGMENTS · HUMAN 6 of 6

SEGMENTS · AI 0 of 6

WORD COUNT 1,950

PEAK AI % 5% · §4

Analyzed

Apr 23

backend: pangram/v3.3

Segments scanned

6 windows

avg 325 words each

Distribution

100 / 0%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,950 words · 6 segments analyzed

Human AI-generated

§1 Human · 1%

This is a demo of a toy language with dynamic typing, inline values, stack allocation, interior pointers, single ownership, and a limited form of borrowing - less expressive than rust, but much more expressive than second-class references (eg we can express external iterators). Since there is no static typing the borrows must be checked dynamically. The interesting part of the demo is that we can do that fairly cheaply and with useful error messages. The code is here. background I'm exploring a style of type-system exemplified by julia and zig. Both languages start with a dynamic type system, enforced by dynamic type-checks, and then layer on a static type system which is capable of proving that the dynamic type-checks are unnecessary. The dynamic type system provides flexibility and easy meta-programming, while the static type system removes the overhead in most of your code. Julia and zig differ slightly in how they handle code that cannot be statically type-checked. Zig will refuse to compile the code at all, while julia will leave some dynamic type-checks and will run the static type-checks again when more type information is available. For zest I'm exploring a third option - code can either be dynamically typed (and interpreted), or statically typed (and compiled), but switching between the two requires explicit annotations. The goal is that most of your code can have the assurances of static typing, but you can still opt in to dynamically-typed glue code to handle repls, live code reloading, runtime code generation, malleable software etc. The tricky part is that I also want to enforce mutable value semantics. To date there are two main strategies for doing this: Reference-counting and copy-on-write, which imposes an unpleasant performance overhead and is hard to combine with interior pointers / explicit stack allocation. Static type systems, which won't help me with my dynamically-typed language. So I have to do something new. This is what I've come up with: The overhead of borrow-checking is limited to some reference-counting operations when creating/dropping/copying references. The reference counts themselves are always stored on the stack so the cache impact is low. Reference counts are never shared between threads, so we don't have to use atomic operations to update them. The reference counting overhead is only paid in dynamically-typed function frames. Reference counts are never allocated on the heap and statically-typed code never has to see them.

§2 Human · 1%

Whenever a borrow-checking rule is violated, the runtime immediately throws an error which identifies the exact value that is at fault. And as a bonus, I'm at least 60% certain that this scheme is actually sound :) repl First a quick note: If you see a green tick below then all the examples are interactive. You can edit the code and hit the eval button to see the result. If you see a red cross instead then probably you have javascript turned off or I didn't test on your browser, and you'll have to make do with the offline result at the end of the code box. ✗ values Our toy language is pretty minimal. We have integers, tuples, functions, and some basic control flow. let nums = [1, 5]; let inc = fn (i) {i + 1}; while {nums[0] < nums[1]} { nums[0] = inc(nums[0]); }; nums [5, 5] Every variable is an independent value. Mutating the value in one variable never affects the value in a different variable. let a = [1, [2, 3]]; // `b` is an independent copy of `a` let b = a; b[1][0] = 42; // `a` is unchanged [a, b] [[1, [2, 3]], [1, [42, 3]]] At the end of each block ({}), for every variable defined in that block we drop the associated value, freeing any memory used by that value. let a = [1, [2, 3]]; { let b = [4, 5]; let c = 6; // c is dropped here // b is dropped here }; // a is dropped here [] This approach to combining value semantics with mutation is simple and easy to implement. But it's also useless. Creating a full copy of every value on every use is infeasible when we want to work with bigger values. We need a way to express sharing between values without breaking value semantics. references References let us express the idea of a value that is stored in a different location to its parent value. One way to make a reference is the box function which stores its contents on the heap.

§3 Human · 2%

In the example below, the value [2, 3] is stored on the heap and the value [1, box(...)] is stored on the stack. [1, box([2, 3])] [1, box([2, 3])] The dereference operator * is used to reach inside a reference to access its contents. let a = [1, box([2, 3])]; a[1]*[0] 2 So what should the code below do? let a = [1, box([2, 3])]; let b = a; // What does this mean? b*[0] = 42; a We could copy the contents of the box, but that's not a solution that scales to arbitrarily large data-structures. We could end up copying gigabytes of memory! Or we could just copy the pointer itself and share the heap allocation between a and b. But then the assignment to b*[0] will be visible in a, breaking the illusion of value semantics. What we actually do, unless you add more explicit annotations, is to just refuse to copy boxes at all. let a = [1, box([2, 3])]; let b = a[1]; b*[0] = 42; b* Error at 2:9 Can't copy an owned reference To work with references we have to be more specific about what we mean. We have a few options. The first option is to move the value using ^. This gives us a copy of the reference but destroys the original! let a = [1, box([2, 3])]; let b = a[1]^; b*[0] = 42; b* [42, 3] We leave those XXX in the diagram to show that the original reference has been destroyed and nothing has replaced it yet. What exactly the XXX means is a question we'll leave for later. (We could just pick some zero value that is valid for all types, but that feels like a mistake). Destroyed values can be overwritten with new values.

§4 Human · 5%

§5 Human · 1%

§6 Human · 1%

if index* < len(tuple_ref**) { let elem = tuple_ref**[index*]; index* = {index* + 1}; elem } else { [] } } }; let a = [1,2]; let next = iter_copy(a&); [next!(), next!(), next!()] [1, 2, []] safety Under the hood, borrowed and shared references are implemented as pointers to the original value. Those XXXs don't actually exist in the implementation. The goal of borrow-checking is to try to have our cake and eat it - we want to provide the simplicity of value semantics while keeping the performance of reference semantics, and never let you get into a situation where you could tell the difference. Ensuring that moving, borrowing, and sharing don't break the illusion is surprisingly easy to do with a static type system, and surprisingly hard to do with a dynamic type system, or at least hard to do cheaply. This is the best I've managed so far and it's still quite restrictive compared to rust. Maybe the easiest way to explain how it works is to first list all the things that it prevents you from doing, and then talk about the implementation afterwards. The biggest restriction is that owned references (boxes) are not allowed to point to borrowed/shared references. This ensures that borrowed/shared references only live on the stack, which makes many of the other rules easier to enforce dynamically. let a = 1; let b = box(a&); Error at 2:9 Can't box a shared ref let a = box(box(1)); let b = 2; a* = b&; Error at 3:1 Can't assign a value of type `number&` to a location of type `box(number)` While a borrowed reference points to a value, no more borrowed/shared references to that value can be created. let a = 1; let b = a!; let c = a!; Error at 3:9 Can't borrow `a` because it is borrowed by `b` The runtime does track when borrowed references are dropped though. let a = 1; { let b = a!; // `b` is dropped at the end of this scope }; let c = a!; // now we can borrow `a` again [] let a = 1; let b = a!;