Inside the RunMat Runtime: a Rust-like compiler pipeline to resolve MATLAB language semantics
Pangram verdict · v3.3
We believe that this document is fully AI-generated
AI likelihood · overall
AIArticle text · 1,745 words · 5 segments analyzed
If you have only seen MATLAB from the outside, it is easy to think of it as a matrix-oriented scripting language: arrays, plots, numerical routines, and a prompt where engineers try things quickly. That is all true, and it is also why MATLAB is still relevant: it is one of the languages engineers and scientists are taught to think in when they learn applied math, simulation, controls, signal processing, optimization, and numerical computing. A lot of teams keep using it because their models, tools, habits, and domain knowledge are already there. MATLAB is also a large language and runtime environment for scientific and engineering programs. Real MATLAB code is often split across many files. It uses package folders, class folders, private helper folders, function handles, dynamically resolved calls, workspace state, object-oriented classdef classes, multiple return values, overloaded indexing, and interactive execution. A lot of long-lived engineering code depends on those semantics. RunMat is a Rust runtime and compiler for that MATLAB-family code. As a result, it is not only trying to evaluate simple expressions like A + B. It needs to execute real-world programs that look more like this, while preserving the semantics that the original program depends on: model = SignalModel(samples); [filtered, stats] = filters.lowpass(model.samples, 30); model.history{end + 1} = stats; plot(filtered); To run that correctly, RunMat makes several language questions explicit during compilation and execution:
Is SignalModel a variable, a function, a class constructor, or an unresolved external name? If it is a variable, is this an attempted array/object call rather than construction? Does filters.lowpass refer to a package function, a static method, or a field access followed by a call? How many outputs did the caller request from lowpass, and can the callee observe that through nargout? Is model.samples a stored property, a dependent property, or an overloaded member access? What does end mean inside model.history{end + 1}? Should plot(filtered) return a value, update figure state, or suppress ordinary output?
Those are not parsing questions. They are semantic questions. For a team trying to run existing engineering code, they are also compatibility questions.
If the runtime answers them too late, or answers them differently in the interpreter, language server, JIT, and GPU planner, the system becomes hard to reason about. RunMat resolves those decisions through a staged compiler pipeline: source -> AST -> semantic HIR -> MIR -> MIR analysis -> VM layout + bytecode -> runtime/providers That staging is the useful connection to Rust: source is lowered into progressively more explicit compiler products before execution. Names, functions, classes, bindings, effects, output counts, indexing contexts, and runtime layout become facts that later stages can reuse. This post explains why MATLAB needs that kind of pipeline and how RunMat's runtime is structured today. Why MATLAB needs semantic resolution MATLAB syntax is compact because a lot of meaning is supplied by context. That is part of why the language is productive for numerical work. It is also why MATLAB is hard to execute correctly (and ideally statically): the same surface syntax can mean several different things, and which one is intended depends on the context. Return to the opening example:
SignalModel(samples) might be a class constructor, a function call, a variable being indexed, or an unresolved external name. filters.lowpass might be a package function, a static method, or a field access followed by a call. [filtered, stats] = ... requests two outputs, and that output count can affect how the callee behaves through nargout or varargout. model.samples might be a stored property, a dependent property, or overloaded member access. model.history{end + 1} is not only indexing; it is an assignment target whose end depends on the current value of history. plot(filtered) is a call with figure and display effects.
The same surface syntax can therefore mean several different things. RunMat's compiler records which role the program resolved to, so later runtime paths can share the same answer. This is why we chose to put a semantic stage in the pipeline. A direct AST-to-bytecode runtime was possible (and was our previous approach), but it tended to scatter these decisions across the interpreter, runtime dispatch, editor tooling, JIT, and acceleration planner. RunMat as of 0.5+ now resolves the source into shared language facts first.
That gives the rest of the system one place to ask what the program means, and it lets those facts match what existing MATLAB users expect their code to do. Pipeline overview The current pipeline is: MATLAB source -> parser AST -> semantic HIR assembly -> MIR assembly -> MIR analysis store -> VM assembly layout -> bytecode -> interpreter/JIT/runtime providers Each stage has a different job. The AST preserves syntactic structure. Semantic HIR says what the source means as a MATLAB program: which functions exist, which names bind to which language entities, which classes are defined, which calls request which outputs, and which bindings should be visible in the workspace. MIR makes control flow, places, calls, and effects easier to analyze and compile. Analysis attaches facts. VM layout maps semantic bindings into executable frame slots. Bytecode is what the interpreter and JIT consume. Runtime providers handle concrete execution services such as builtin dispatch, plotting, filesystem access, workspace materialization, and acceleration. The docs version of those layers starts with the Compilation Pipeline overview, then goes deeper on High-Level IR (HIR), Mid-Level IR (MIR), MIR & Static Analysis, and Bytecode Compilation. The benefit of this separation is that language decisions become reusable. The interpreter can execute them, the language server can explain them, the JIT can compile through them, and the acceleration planner can use them as inputs. The compiler does not have to rediscover from VM stack shape that a call was a two-output call, that an index was a deletion target, or that a function was a nested capture. The bytecode product also becomes more useful. It is VM bytecode, not platform-specific machine code, so it can be consumed by the interpreter, the JIT, snapshots, and tooling. That is also the direction that makes future ahead-of-time compilation or static binary packaging plausible: the source can lower into stable semantic and bytecode products before a later target decides how to execute or package them. Static binaries would still need to carry or link the relevant RunMat runtime services, and highly dynamic MATLAB features would still need explicit policies, but the architecture no longer ties execution to reinterpreting source text.
Semantic HIR: the language product RunMat's semantic HIR is the compiler's record of what the MATLAB program means after source layout and names have been resolved. It is represented as an assembly. An assembly owns tables for:
modules entrypoints functions classes bindings
These are semantic IDs, not VM slots. That distinction is important for MATLAB because users think in terms of variables, functions, files, classes, packages, and workspaces, not storage offsets. A binding like total in a nested function has a language identity before the VM decides where it lives in a frame. A class method belongs to a class before the runtime decides how to dispatch it. A function call can refer to a builtin, bound function, imported path, dynamic name, or external boundary before bytecode is emitted. A HIR function carries its MATLAB ABI:
fixed inputs varargin fixed outputs varargout implicit nargin implicit nargout captures parent function enclosing class, if any
That lets RunMat represent the opening example as language structure instead of VM stack shape: SignalModel can be resolved as a constructor, filters.lowpass can be resolved as a package function or method-like call, the call can carry an exact requested output count of two, model.history{end + 1} can be marked as an indexed assignment target, and plot(filtered) can be treated as an effectful plotting call. The same structure also covers other MATLAB function behavior that engineers rely on: local functions, nested functions that share parent scope, anonymous functions, function handles, nargin, nargout, varargin, and varargout. If the compiler loses those relationships, the code may still parse, but it will not behave like MATLAB. Calls also carry semantic identity. A call target can be a bound function, builtin, imported path, dynamic expression, super constructor, super method, or unresolved qualified name. The call also carries requested output count and source syntax. That is how later stages know the difference between direct calls, method syntax, dotted calls, feval, and output-expanding calls. Indexing is also semantic. HIR records the index kind and result context: paren or brace, read or assignment, deletion target or comma-list expansion, function-argument expansion or ordinary single read.
That context becomes critical later because the same surface syntax can mean read, write, deletion, expansion, or overloaded object dispatch. MIR: control flow and compiler facts HIR is still close to MATLAB language structure. MIR is lower-level. It is the form where RunMat turns language meaning into explicit control flow and assignable places. RunMat's MIR assembly contains bodies keyed by function. A MIR body has locals and basic blocks. Blocks contain statements and a terminator. Places represent assignable locations. Rvalues represent computed values. MIR can represent:
local and binding places member places dynamic member places indexed places assignments multi-assignments expression statements workspace effects environment effects branches loops switch try/catch return await future creation spawn tensor, cell, struct, and object literals calls with requested output counts
This gives the compiler a form that is easier to analyze. For example, initialization analysis is more natural on MIR blocks than on a source syntax tree. So is spawn-safety analysis, fusion candidate detection, and validation that an indexed assignment target was lowered with an assignment context. MIR also gives RunMat a place to keep "place" and "value" roles separate. A read from A(i) and a write to A(i) share source syntax but have different roles. MIR can preserve that difference explicitly. Analysis: facts that feed execution RunMat's MIR analysis stores facts that later stages can use. Examples include:
whether a local is unassigned, maybe assigned, or definitely assigned simple type facts shape facts value-flow facts async/future facts spawn boundary metadata diagnostics fusion eligibility signals
These facts are not only for error messages. They feed the runtime architecture. The point is not to make MATLAB statically typed; it is to give the runtime enough shared facts that execution, tooling, and acceleration do not each invent their own approximate model of the program. For acceleration, semantic and MIR facts give the planner more to work with than a bytecode scan. The compiler can identify MIR regions that are semantically pure enough or structurally suitable enough to be fusion candidates.