blob: 0357c7a897049f22170fce629208a2feb59978d6 [file] [log] [blame] [view]
# Intermediate Representation
As Tint has grown the number of transforms on the AST has grown. This
growth has lead to several issues:
1. Transforms rebuild the AST and SEM which causes slowness
1. Transforming in AST can be difficult as the AST is hard to work with
In order to address these goals, an IR was introduced into Tint. The IR
is mutable, it holds the needed state in order to be transformed. The IR
is also translatable back into AST. It will be possible to generate an
AST, convert to IR, transform, and then rebuild a new AST. This
round-trip ability provides a few features:
1. Easy to integrate into current system by replacing AST transforms
piecemeal
1. Easier to test as the resulting AST can be emitted as WGSL and
compared.
The IR helps with the complexity of the AST transforms by limiting the
representations in the IR form. For example, instead of `for`, `while`
and `loop` constructs there is a single `loop` construct. `alias` and
`const_assert` nodes are not emitted into IR. Dead code maybe eliminated
during IR construction.
As the IR can convert into AST, we could potentially simplify the
SPIRV-Reader by generating IR directly. The IR is closer to what SPIR-V
looks like, so maybe a simpler transform.
## Design
The IR is composed of three primary concepts, `blocks`, `values`, and
`instructions`.
The `blocks` provide lists of `instructions` which are executed
sequentially. An `instruction` may contain `blocks`, `value` operands
and `value` results. A `value` is a block argument, function parameter,
constant result, or a result computed by an `instruction`.
Each `block` ends in a `terminator` expression. There are a few blocks
which maybe empty, `if-false block`, `loop-initializer` and
`loop-continuing`. Those blocks are optional. The `terminators` may hold
`values` which are returned out of the block being terminated.
Control flow is handled through `ControlInstructions`. These are the
`if`, `switch` and `loop` instructions. Each control instruction
contains `blocks` for the various control flow paths.
Transforming from AST to IR and back to AST is a lossy operation.
The resulting AST when converting back may not be the same as the
AST being provided. This transformation is intentional as it greatly
simplifies the number of things to consider in the IR. For instance:
* No `alias` nodes
* No `const_assert` nodes
### Code Structure
The code is contained in the `src/tint/ir` folder and is broken down
into several classes. Note, the IR is a Tint _internal_ representation
and these files should _never_ appear in the public API.
#### Builder
The `Builder` class provides useful helper routines for creating IR
content. The Builder references an `ir::Module`.
#### Module
The top level of the IR is the `Module`. The module stores a list of
`functions`, constant manager, type manager, allocators and various other
bits of information needed by the IR. The `module` _may_ contain a
disassembled representation of the module if the disassembler has been
called.
A free `ir::FromProgram` method is provided to convert from a
`Program` to an `ir::Module`. A similar `ToProgram` free method is
provided for the reverse conversion.
### Transforms
Similar to the AST a transform system is available for IR. The transform
has the same setup as the AST (and inherits from the same base transform
class.)
The IR transforms live in `src/tint/lang/core/ir/transform`. These transforms are
for use by the various IR generator backends.
Unlike with the AST transforms, the IR transforms know which transforms
must have run before themselves. The transform manager will then verify
that all the required transforms for a given transform have executed.
### Validation
The IR contains a validator. The validator checks for common errors
encountered when building IR. The validator is _not_ run in production
as any error is an internal coding error. The validator is executed by
the generators before starting, and after each transform executes in
debug mode.
### Naming
The instructions are internal to the IR, and the values in the IR do not
contain names. The IR module has a symbol table which can be used to
retrieve names for the instructions and values but those names may not
be the names provided originally by the AST.
### Control Flow
#### Block
A block contains the instruction lists for a section of code. A block
always ends in a terminator instruction.
The instructions in a block can be walked in linear scoped fashion. The
control instructions do not need to be entered to walk a blocks
instructions.
#### Control Instructions
The control instructions provide containers for other blocks. These
include the `if`, `loop` and `switch` instruction. A control instruction
is a self contained scope on top of the containing block scope.
##### Control Instruction -- If
The if instruction is an `if-else` structure. There are no `else-if`
entries, they get moved into the `else` of the `if`. The if instruction
has two internal blocks, the `True` and `False` blocks.
A if instruction will _always_ have a `True` target. The `False` target
maybe empty.
The sub-blocks under the `if` can be terminated with `ExitIf`,
`ExitSwitch`, `ExitLoop`, `Continue` or `Return`.
#### Control Instruction -- Loop
All of the loop structures in AST merge down to a single loop
instruction. The loop contains the `Body`, which is required and
optional `Initializer` and `Continuing` blocks.
The `loop` body can be terminated with a `ExitLoop`, `Continue`,
`NextIteration` or `Return`.
The `loop` continuing can be terminated with a `NextIteration`, or a
`BreakIf` terminator.
A while loop is decomposed as listed in the WGSL spec:
```
while (a < b) {
c += 1;
}
```
becomes:
```
loop {
if (!(a < b)) {
break;
}
c += 1;
}
```
A for loop is decomposed as listed in the WGSL spec:
```
for (var i = 0; i < 10; i++) {
c += 1;
}
```
becomes:
```
var i = 0;
loop {
if (!(i < 10)) {
break;
}
c += 1;
continuing {
i++;
}
}
```
#### Control Instruction -- Switch
The switch instruction has a block for each of the `case/default` labels.
The `switch` case blocks can be terminated with an `ExitSwitch`,
`Continue`, or `Return`.
#### Expressions
All expressions in IR are single operations. There are no complex
expressions. Any complex expression in the AST is broke apart into the
simpler single operation components.
```
var a = b + c - (4 * k);
```
becomes:
```
%0:i32 = add %b, %c
%1:i32 = mul 4u, %k
%2:i32 = sub %0, %1
%v:ptr<function, i32, read_write> = var %2
```
This also means that many of the short forms `i += 1`, `i++` get
expanded into the longer form of `i = i + 1`.
##### Short-Circuit Expressions
The short-circuit expressions (e.g. `a && b`) are convert into an `if`
structure control flow.
```
let c = a() && b()
```
becomes
```
let c = a();
if (c) {
c = b();
}
```
#### Values
There are several types of values used in the SSA form.
1. `Constant` Value
1. `InstructionResult` Value
1. `Function` Value
1. `FunctionParam` Value
1. `BlockParam` Value
##### Constant Value
All values in IR are concrete, there are no abstract values as
materialization has already happened. Each constant holds a lower level
`constant::Value` from the `src/tint/lang/core/ir/constant` system.
##### InstructionResult Value
The `InstructionResult` is the result of an instruction. The
temporaries are created as complex expressions are broken down into
pieces. The result tracks the usage for the value so you can determine
which instructions use this value. The `InstructionResult` also points
back to its `Source` instruction.
### Function Value
All `Functions` are values in the IR. This allows the function to be
provided as a Value argument to things like the `Call` instructions.
##### FunctionParam Value
The function param values are used to store information on the values
being passed into a function call.
### BlockParam Values
The `BlockParam` values are used to pass information for a
`MultiInBlock` (e.g. Loop body, and loop continuing).
## Alternatives
Instead of going to a custom IR there are several possible other roads
that could be travelled.
### Mutable AST
Tint originally contained a mutable AST. This was converted to immutable
in order to allow processing over multiple threads and for safety
properties. Those desires still hold, the AST is public API, and we want
it to be as safe as possible, so keeping it immutable provides that
guarantee.
### Multiple Transforms With One Program Builder
Instead of generating an immutable AST after each transform, running
multiple transforms on the single program builder would remove some of
the performance penalties of going to and from immutable AST. While this
is true, the transforms use a combination of AST and SEM information.
When they transform they _do not_ create new SEM information. That
means, after a given transform, the SEM is out of date. In order to
re-generate the SEM the resolver needs to be rerun. Supporting this
would require being very careful on what transforms run together and
how they modify the AST.
### Adopt An Existing IR
There are already several IRs in the while, Mesa has NIR, LLVM has
LLVM IR. There are others, adopting one of those would remove the
requirements of writing and maintaining our own IR. While that is true,
there are several downsides to this re-use. The IRs are internal to the
library, so the API isn't public, LLVM IR changes with each iteration of
LLVM. This would require us to adapt the AST -> IR -> AST transform for
each modification of the IR.
They also end up being lower level then is strictly useful for us. While
the IR in Tint is a simplified form, we still have to be able to go back
to the high level structured form in order to emit the resulting HLSL,
MSL, GLSL, etc. (Only SPIR-V is a good match for the lowered IR form).
This transformation back is not a direction other IRs maybe interested
in so may have lost information, or require re-determining (determining
variables from SSA and PHI nodes for example).
Other technical reasons are the maintenance of BUILD.gn and CMake files
in order to integrate into our build systems, along with resulting
binary size questions from pulling in external systems.