Intermediate Representation

As Tint has grown the number of transforms on the AST has grown. This growth has lead to several issues:

  1. Transforms rebuild the AST and SEM which causes slowness
  2. Transforming in AST can be difficult as the AST is hard to work with

In order to address these goals, an IR was introduced into Tint. The IR is mutable, it holds the needed state in order to be transformed. The IR is also translatable back into AST. It will be possible to generate an AST, convert to IR, transform, and then rebuild a new AST. This round-trip ability provides a few features:

  1. Easy to integrate into current system by replacing AST transforms piecemeal
  2. Easier to test as the resulting AST can be emitted as WGSL and compared.

The IR helps with the complexity of the AST transforms by limiting the representations in the IR form. For example, instead of for, while and loop constructs there is a single loop construct. alias and const_assert nodes are not emitted into IR. Dead code maybe eliminated during IR construction.

As the IR can convert into AST, we could potentially simplify the SPIRV-Reader by generating IR directly. The IR is closer to what SPIR-V looks like, so maybe a simpler transform.

Design

The IR is composed of three primary concepts, blocks, values, and instructions.

The blocks provide lists of instructions which are executed sequentially. An instruction may contain blocks, value operands and value results. A value is a block argument, function parameter, constant result, or a result computed by an instruction.

Each block ends in a terminator expression. There are a few blocks which maybe empty, if-false block, loop-initializer and loop-continuing. Those blocks are optional. The terminators may hold values which are returned out of the block being terminated.

Control flow is handled through ControlInstructions. These are the if, switch and loop instructions. Each control instruction contains blocks for the various control flow paths.

Transforming from AST to IR and back to AST is a lossy operation. The resulting AST when converting back may not be the same as the AST being provided. This transformation is intentional as it greatly simplifies the number of things to consider in the IR. For instance:

  • No alias nodes
  • No const_assert nodes

Code Structure

The code is contained in the src/tint/ir folder and is broken down into several classes. Note, the IR is a Tint internal representation and these files should never appear in the public API.

Builder

The Builder class provides useful helper routines for creating IR content. The Builder references an ir::Module.

Module

The top level of the IR is the Module. The module stores a list of functions, constant manager, type manager, allocators and various other bits of information needed by the IR. The module may contain a disassembled representation of the module if the disassembler has been called.

A free ir::FromProgram method is provided to convert from a Program to an ir::Module. A similar ToProgram free method is provided for the reverse conversion.

Transforms

Similar to the AST a transform system is available for IR. The transform has the same setup as the AST (and inherits from the same base transform class.)

The IR transforms live in src/tint/lang/core/ir/transform. These transforms are for use by the various IR generator backends.

Unlike with the AST transforms, the IR transforms know which transforms must have run before themselves. The transform manager will then verify that all the required transforms for a given transform have executed.

Validation

The IR contains a validator. The validator checks for common errors encountered when building IR. The validator is not run in production as any error is an internal coding error. The validator is executed by the generators before starting, and after each transform executes in debug mode.

Naming

The instructions are internal to the IR, and the values in the IR do not contain names. The IR module has a symbol table which can be used to retrieve names for the instructions and values but those names may not be the names provided originally by the AST.

Control Flow

Block

A block contains the instruction lists for a section of code. A block always ends in a terminator instruction.

The instructions in a block can be walked in linear scoped fashion. The control instructions do not need to be entered to walk a blocks instructions.

Control Instructions

The control instructions provide containers for other blocks. These include the if, loop and switch instruction. A control instruction is a self contained scope on top of the containing block scope.

Control Instruction -- If

The if instruction is an if-else structure. There are no else-if entries, they get moved into the else of the if. The if instruction has two internal blocks, the True and False blocks.

A if instruction will always have a True target. The False target maybe empty.

The sub-blocks under the if can be terminated with ExitIf, ExitSwitch, ExitLoop, Continue or Return.

Control Instruction -- Loop

All of the loop structures in AST merge down to a single loop instruction. The loop contains the Body, which is required and optional Initializer and Continuing blocks.

The loop body can be terminated with a ExitLoop, Continue, NextIteration or Return.

The loop continuing can be terminated with a NextIteration, or a BreakIf terminator.

A while loop is decomposed as listed in the WGSL spec:

while (a < b) {
  c += 1;
}

becomes:

loop {
  if (!(a < b)) {
    break;
  }
  c += 1;
}

A for loop is decomposed as listed in the WGSL spec:

for (var i = 0; i < 10; i++) {
  c += 1;
}

becomes:

var i = 0;
loop {
  if (!(i < 10)) {
    break;
  }

  c += 1;

  continuing {
    i++;
  }
}

Control Instruction -- Switch

The switch instruction has a block for each of the case/default labels.

The switch case blocks can be terminated with an ExitSwitch, Continue, or Return.

Expressions

All expressions in IR are single operations. There are no complex expressions. Any complex expression in the AST is broke apart into the simpler single operation components.

var a = b + c - (4 * k);

becomes:

%0:i32 = add %b, %c
%1:i32 = mul 4u, %k
%2:i32 = sub %0, %1
%v:ptr<function, i32, read_write> = var %2

This also means that many of the short forms i += 1, i++ get expanded into the longer form of i = i + 1.

Short-Circuit Expressions

The short-circuit expressions (e.g. a && b) are convert into an if structure control flow.

let c = a() && b()

becomes

let c = a();
if (c) {
  c = b();
}

Values

There are several types of values used in the SSA form.

  1. Constant Value
  2. InstructionResult Value
  3. Function Value
  4. FunctionParam Value
  5. BlockParam Value
Constant Value

All values in IR are concrete, there are no abstract values as materialization has already happened. Each constant holds a lower level constant::Value from the src/tint/lang/core/ir/constant system.

InstructionResult Value

The InstructionResult is the result of an instruction. The temporaries are created as complex expressions are broken down into pieces. The result tracks the usage for the value so you can determine which instructions use this value. The InstructionResult also points back to its Source instruction.

Function Value

All Functions are values in the IR. This allows the function to be provided as a Value argument to things like the Call instructions.

FunctionParam Value

The function param values are used to store information on the values being passed into a function call.

BlockParam Values

The BlockParam values are used to pass information for a MultiInBlock (e.g. Loop body, and loop continuing).

Alternatives

Instead of going to a custom IR there are several possible other roads that could be travelled.

Mutable AST

Tint originally contained a mutable AST. This was converted to immutable in order to allow processing over multiple threads and for safety properties. Those desires still hold, the AST is public API, and we want it to be as safe as possible, so keeping it immutable provides that guarantee.

Multiple Transforms With One Program Builder

Instead of generating an immutable AST after each transform, running multiple transforms on the single program builder would remove some of the performance penalties of going to and from immutable AST. While this is true, the transforms use a combination of AST and SEM information. When they transform they do not create new SEM information. That means, after a given transform, the SEM is out of date. In order to re-generate the SEM the resolver needs to be rerun. Supporting this would require being very careful on what transforms run together and how they modify the AST.

Adopt An Existing IR

There are already several IRs in the while, Mesa has NIR, LLVM has LLVM IR. There are others, adopting one of those would remove the requirements of writing and maintaining our own IR. While that is true, there are several downsides to this re-use. The IRs are internal to the library, so the API isn't public, LLVM IR changes with each iteration of LLVM. This would require us to adapt the AST -> IR -> AST transform for each modification of the IR.

They also end up being lower level then is strictly useful for us. While the IR in Tint is a simplified form, we still have to be able to go back to the high level structured form in order to emit the resulting HLSL, MSL, GLSL, etc. (Only SPIR-V is a good match for the lowered IR form). This transformation back is not a direction other IRs maybe interested in so may have lost information, or require re-determining (determining variables from SSA and PHI nodes for example).

Other technical reasons are the maintenance of BUILD.gn and CMake files in order to integrate into our build systems, along with resulting binary size questions from pulling in external systems.