Better is faster

Mutation as a compiler optimization

Rust has recently popularized the idea of statically analyzed ownership (or substructural typing, if you prefer math to engineering). This is useful for providing strong guarantees about memory safety statically in a procedural language, but some of the most interesting benefits of this system are unused by Rust's imperative style.

Rust's management of ownership is related to the idea of 'unique' types. A unique type guarantees that there is only one location in memory that points to a given object at a given time. Rust uses this largely for thread-safety and increasing the number of guarantees the compiler can use to make optimizations. This property can also be used to enforce referential transparency, or freedom from unexpected side effects.

With an imperative style, guaranteeing this property involves a number of rules the compiler must checks, and in some cases even requires manual user intervention in managing variable lifetimes. However, with a typical functional style, only these rules need to be followed to make the guarantee that any given value is only referenced by one variable:

  1. A variable may only be consumed in zero or one location in a function body.
  2. If a variable is stored in another data structure through a constructor, it is consumed, because that structure now contains a permanent reference to the variable.
  3. When a variable is destructured through pattern matching, it is consumed, because that variable has been permanently renamed through destructuring.
  4. When a variable is passed into a function, and that function consumes the variable in its body, that variable is consumed.
  5. When a function is called, it is consumed if it returns any part of its closure (whether it is self-consuming is a part of its type).
For example, in Haskelly syntax, this implementation of a repeat function is not allowed.

repeat :: Number -> a -> [a] repeat 0 _ = [] repeat n element = element:(repeat (n - 1) element)

Here you can see that element is inserted into a list, taking a reference and consuming it, but also passed recursively into repeat, which will also consume it. This would require an indefinite number of references to the original element of the list, which breaks our foundational rule that any given value can be referenced by only one location.

On the other hand, this direct implementation of map is permissible (for a function that can be called multiple times).

map :: (a -> b) -> [a] -> [b] map _ [] = [] map f (x:xs) = (f x):(map f xs)

Here, each variable is referenced once, except for f, which is fine because calling this function doesn't consume it, since calling a function doesn't create any lasting memory locations that point to it.

Now, one part of the value of this guarantee for a functional style language is that if a parameter isn't used in a function body, you know you that it can be immediately freed. This completely eliminates the need for a garbage collector to scan through the heap at runtime and figure out what's currently live.

Slightly less obvious is the ability to implicitly define a mutating operation. The map function above is compiled into a simple, non-recursive loop that directly replaces each cell with f x! When a function argument is deconstructed, as in the x:xs, we know that the original object (in this case a list cell) is no longer being referred to by any variable. Instead of freeing that object, we can place the new data from (f x):(map f xs) into that same memory location, statically.

There's one last ingredient that lets us turn this into a loop. You might have noticed that this doesn't look like a proper tail call. If we have functions return values by modifying a reference passed as its first argument, rather than returning it on the stack, as long as only one element of the final returned object is a recursive call, it can be linearized to be in tail call position. For the above, in C-ish syntax:

map :: (a -> b) -> [a] -> [b] map _ [] = [] map f (x:xs) = (f x):(map f xs) ===> struct list { void *value; struct list *next; }; // "result" is an allocated space where the return for this function can be placed. void map( struct list **result, void (*f)(void **, void *), struct list *initial) { tailcall: switch (initial) { // Assuming the empty cell is represented by null. case 0x0: *result = initial; break; default: f((*result)->value), initial->value); result = (*result)->next; initial = initial->next; goto tailcall; } }

Altogether, this lets us have a purely functional programming language with predictable and fast performance behaviors. Functional language design focused on performance that I've seen has typically been more focused on taking relatively unrestricted language design and finding specific circumstances where it can be optimized. This system seems to promise a performance model closer to "systems" languages, avoiding the concern where changing a couple lines can suddenly deoptimizing huge swaths of code, and mostly avoiding unpredictable garbage collection behavior. There are certainly some sacrifices, in that you can only directly represent trees with unique pointers, so arbitrary graphs require some manual indireciton with number indices.

We also need a custom allocator for this, since if we want to be constantly allocating and freeing tiny memory cells as functional languages often do, we can't tolerate typical allocators, which tend to be optimized for larger and longer lasting memory allocations. I talk about an allocator targeted to this system in the next post.