blob: d70068b47e35d33c56b6703118147937620abee8 [file] [log] [blame] [view]
# Tint Fuzzers
Tint currently has two fuzzer target executables:
[`tint_wgsl_fuzzer`](#tint_wgsl_fuzzer) which takes WGSL source code
as input and [`tint_ir_fuzzer`](#tint_ir_fuzzer) which takes a
protobuf binary file as input.
Both fuzzers are implemented using
[libFuzzer](https://llvm.org/docs/LibFuzzer.html), and are intended to
be automatically and continuously run by Chromium's ClusterFuzz
infrastructure. Currently only `tint_wgsl_fuzzer` is deployed on
ClusterFuzz.
[Fuzzer targets are automatically found via `gn
refs`](https://chromium.googlesource.com/chromium/src/+/HEAD/testing/libfuzzer/libFuzzer_integration.md). See
[`tint.gni`](../../src/tint/tint.gni) for the core fuzzer target
rules.
Tint's fuzzers are implemented as functions registered with the
macros:
- [`TINT_WGSL_PROGRAM_FUZZER()`](#registering-a-new-tintprogram-fuzzer)
registers a fuzzer function that is handed a `tint::Program`.
- [`TINT_IR_MODULE_FUZZER()`](#registering-a-new-tintcoreirmodule-fuzzer)
registers a fuzzer function that is handed a
`tint::core::ir::Module`.
## Building
The `tint_wgsl_fuzzer` target can be built with either CMake or GN:
- CMake: Define `TINT_BUILD_FUZZERS=1` (pass `-DTINT_BUILD_FUZZERS=1`
to `cmake`)
- GN: Define `use_libfuzzer = true` in `args.gn`.
Building `tint_ir_fuzzer` requires additional flags to be set:
- CMake: Define `TINT_BUILD_IR_BINARY=1` and `TINT_BUILD_IR_FUZZER=1`
(pass `-DTINT_BUILD_IR_BINARY=1` `-DTINT_BUILD_IR_FUZZER=1` to
`cmake`)
- GN: Define `tint_build_ir_binary = true` and `tint_build_ir_fuzzer =
true` in `args.gn`.
## Running fuzzers
### Local fuzzing
The [`tint_wgsl_fuzzer`](#tint_wgsl_fuzzer) and
[`tint_ir_fuzzer`](#tint_ir_fuzzer) executables accept the [standard
`libFuzzer` command line
arguments](https://llvm.org/docs/LibFuzzer.html#options) with
[extended command line arguments](#extended-command-line-arguments)
described below.
There's also a helper tool to run the fuzzers locally:
- To run the local fuzzers across the full number of CPU threads
available on the system, seeded with the corpus in
[`test/tint`](../../test/tint), and using the dictionary in
`src/tint/cmd/fuzz/wgsl/dictionary.txt` run:
`tools/run fuzz`
- To check that all the test files in the corpus directory,
[`test/tint`](../../test/tint) by default, pass the fuzzers without
error and then exit, run:
`tools/run fuzz --check`
Note: This is run by Dawn's CQ presubmit to check that fuzzers
aren't accidentally broken.
- To run the local fuzzers using the same corpus used by ClusterFuzz:
`tools/run fuzz -corpus out/libfuzz/gen/fuzzers/wgsl_corpus`
Note that this corpus directory is generated when building the GN
target `tint_generate_wgsl_corpus`.
### Generating the corpus
Generate the `tint_wgsl_fuzzer` corpus using the
`tint_generate_wgsl_corpus` GN target, which produces a corpus in
`<build_dir>/gen/fuzzers/wgsl_corpus`. Pass in the path to the corpus
directory as an argument to the fuzzer executable to use it. It's also
a good idea to pass in the dictionary with
`-dict=src/tint/cmd/fuzz/wgsl/dictionary.txt`:
```bash
autoninja -C out/libfuzz tint_generate_wgsl_corpus
out/libfuzz/tint_wgsl_fuzzer.exe -dict=src/tint/cmd/fuzz/wgsl/dictionary.txt out/libfuzz/gen/fuzzers/wgsl_corpus
```
Similarly, the `tint_ir_fuzzer` corpus can be generated using the
`tint_generate_ir_corpus` GN target, producing the corpus in
`<build_dir>/gen/fuzzers/ir_corpus`. For the IR fuzzer, we don't pass
in the dictionary, since we are using
[`libprotobuf-mutator`](https://github.com/google/libprotobuf-mutator)
for mutating (the proto file effectively defines the dictionary):
```bash
autoninja -C out/libfuzz tint_generate_ir_corpus
out/libfuzz/tint_ir_fuzzer.exe out/libfuzz/gen/fuzzers/ir_corpus
```
## Writing fuzzers
### Registering a new `tint::Program` fuzzer
1. Create a new source file with a `_fuzz.cc` suffix.
2. `#include "src/tint/cmd/fuzz/wgsl/fuzz.h"`
3. Define a function in a (possibly nested) anonymous namespace with
one of the signatures:
- `void MyFuzzer(const tint::Program& program /*, ...additional fuzzed parameters... */) {`
- `void MyFuzzer(const tint::Program& program, const tint::fuzz::wgsl::Context& context /*, ...additional fuzzed parameters... */) {`
The optional `context` parameter holds information about the
`Program` and the environment used to run the fuzzers.
[Note: Any number of additional fuzzer-populated parameters can be
appended to the function signature.](#additional-fuzzer-data)
4. Implement your fuzzer function, using `TINT_ICE()` to catch invalid
state. Return early if the fuzzer cannot handle the input.
5. At the bottom of the file, in the global namespace, register the
fuzzer with: `TINT_WGSL_PROGRAM_FUZZER(MyFuzzer);`
6. Use `tools/run gen build` to generate the build files for this new
fuzzer.
Example:
```c++
#include "src/tint/cmd/fuzz/wgsl/fuzz.h"
namespace tint::my_namespace {
namespace {
bool CanRun(const tint::Program& program) {
if (program.AST().HasOverrides()) {
return false; // Overrides are not supported.
}
return true;
}
void MyWGSLFuzzer(const tint::Program& program, bool a_fuzzer_provided_value) {
if (!CanRun(program)) {
return;
}
// Check something with program.
}
} // namespace
} // namespace tint::my_namespace
TINT_WGSL_PROGRAM_FUZZER(tint::my_namespace::MyWGSLFuzzer);
```
### Registering a new `tint::core::ir::Module` fuzzer
1. Create a new source file with a `_fuzz.cc` suffix.
2. `#include "src/tint/cmd/fuzz/ir/fuzz.h"`
3. Define a function in a (possibly nested) anonymous namespace with
the signature:
- `void MyFuzzer(core::ir::Module& module /*, ...additional fuzzed parameters... */) {`
[Note: Any number of additional fuzzer-populated parameters can be
appended to the function signature.](#additional-fuzzer-data)
4. Implement your fuzzer function, using `TINT_ICE()` to catch invalid
state. Return early if the fuzzer cannot handle the input.
5. At the bottom of the file, in the global namespace, register the
fuzzer with: `TINT_IR_MODULE_FUZZER(MyFuzzer);`
[Note: Often different capabilities need to be enabled before/after
a fuzzer function runs to ensure validation passes, these can be
declared using `pre_capabilities` and `post_capabilities` when
registering the fuzzer ]
6. Use `tools/run gen build` to generate the build files for this new
fuzzer.
Example:
```c++
#include "src/tint/cmd/fuzz/ir/fuzz.h"
namespace tint::my_namespace {
namespace {
void MyIRFuzzer(core::ir::Module& module) {
// Do something interesting with module.
}
} // namespace
} // namespace tint::my_namespace
TINT_IR_MODULE_FUZZER(tint::my_namespace::MyIRFuzzer);
```
### Additional fuzzer data
WGSL and IR fuzzer functions can also declare any number of additional
parameters, which will be populated with fuzzer provided data. These
additional parameters must come at the end of the signatures described
above, and can be of the following types:
- Any integer, float or bool type.
- Any structure reflected with `TINT_REFLECT`.
Note: It's recommended to use a `const` reference, for these to
avoid pass-by-value overheads.
- Any enum reflected with `TINT_REFLECT_ENUM_RANGE`.
## Executable targets
Tint has two fuzzer executable targets:
### `tint_wgsl_fuzzer`
`tint_wgsl_fuzzer` [accepts WGSL textual
input](https://llvm.org/docs/LibFuzzer.html#options) and parses line
comments (`//`) as a base-64 binary encoded data stream for the
[additional fuzzer parameters](additional-fuzzer-data).
The entry point for the fuzzer lives at
[`src/tint/cmd/fuzz/wgsl/main_fuzz.cc`](../../src/tint/cmd/fuzz/wgsl/main_fuzz.cc).
#### Extended command line arguments
On top of the [standard `libFuzzer` command line
arguments](https://llvm.org/docs/LibFuzzer.html#options), the fuzzer
supports the following extended command line arguments:
- `--help`: lists the command line arguments.
- `--filter=<name>`: only runs the fuzzer functions that contain the
given string in its name.
- `--concurrent`: each of the fuzzer functions will be run on a
separate, concurrent thread. This potentially offers performance
improvements, and also tests for concurrent execution.
- `--verbose` : prints verbose information about what the fuzzer is
doing.
- `--dump` : prints shader source, including input WGSL, and generated
HLSL, MSL, and GLSL.
#### Behavior
The `tint_wgsl_fuzzer` will do the following:
- Base-64 decode the line comments data from the WGSL source, used to
populate the [additional fuzzer parameters](additional-fuzzer-data).
- Parse and resolve the WGSL input, and will early-return if there are
any parser errors.
- Invoke each of the fuzzer functions registered with a call to
`TINT_WGSL_PROGRAM_FUZZER()`
- Automatically convert the `Program` to an IR module and run the
function for each function registered with
`TINT_IR_MODULE_FUZZER()`. Note: The `Program` is converted to an IR
module for each registered IR fuzzer as the module is mutable.
### `tint_ir_fuzzer`
`tint_ir_fuzzer` accepts binary [protocol
buffer](https://protobuf.dev/) inputs and uses
[`libprotobuf-mutator`](https://github.com/google/libprotobuf-mutator)
for mutating this binary format directly.
The suffix `.tirb` is sometimes used for IR fuzz test case files,
which helps our tooling infer the format. The fuzzer itself does not
generate/depend on this suffix though, instead using the `libFuzzer`
standard prefixes, i.e. `crash-...` and `slow-...`, when needed.
The input protobuf is defined in
[src/tint/utils/protos/ir_fuzz/ir_fuzz.proto](../../src/tint/utils/protos/ir_fuzz/ir_fuzz.proto)
is a composite of two elements, a protobuf
[src/tint/utils/protos/ir/ir.proto](../../src/tint/utils/protos/ir/ir.proto)
that defines the actual IR for the test case, and an opaque binary
blob that includes the [additional fuzzer
parameters](additional-fuzzer-data) akin to `tint_wgsl_fuzzer`.
The entry point for the fuzzer lives at
[`src/tint/cmd/fuzz/ir/main_fuzz.cc`](../../src/tint/cmd/fuzz/ir/main_fuzz.cc).
#### Extended command line arguments
This fuzzer accepts the same set of flags as `tint_wgsl_fuzzer` (both
`libFuzzer` and extended), expect `--dump` which is currently not
supported.
#### Behavior
The `tint_ir_fuzzer` will do the following:
- Decode the binary encoded protobuf into a `Program` for the
IR module and a binary blob that will be passed into each function
as options. If the input cannot be decoded due to being invalid or
containing select constructs that can never occur in real world
inputs, then the fuzzer will early return.
- Validate that decoded IR module, and will early-return if there are
any errors.
- For each fuzzer function registered via `TINT_IR_PROGRAM_FUZZER()`,
make a copy of the IR module, since it is mutable, and then invoke
the fuzzer function.
- Validate that outputted IR module state, and raise an error if it
fails, since this indicates a functional issue with a transform or
the initial validation missed an illegal state.
#### Working with test cases
[Note: The tooling related to working with `tint_ir_fuzz` test cases
is a WIP and may not be complete]
Since the input test cases for `tint_ir_fuzzer` are in a non-human
readable format, there is additional tooling needed for working with
them.
##### ir_fuzz_dis
For displaying the IR contents of a test case file (i.e. `crash-...`,
or `foo.tirb`), there is a disassembler `ir_fuzz_dis`, which is the
easiest way to dump out the contents of a test case. This disassembler
supports a subset of the Tint CLI for outputting in various formats
and to files, etc.
The disassembler has two significant known limitations though.
First, it does not dump out the binary options blob. This is because
this data is interpreted on a per-fuzzer function basis, so there
isn't a general human readable interpretation of its contents. If you
need to understand how the binary blob is being handled by a fuzzer
you will either need to add logging or use a debugger.
The other known limitation is that this disassembler depends on Tint's
IR printing code, which assumes well formed inputs. Sometimes it will
fail to output anything, or more subtly not output values it did't
expect. For example if an instruction only expects 2 params and there
are actually 3 in the IR, then the disassembler may only print the
first two. The validator should catch these issues and give you useful
warnings, but since it is a WIP itself there may be omissions in its
implementation.
##### protoc
Due to the second limitation on the disassembler, there are times
where you will need to dump the raw contents of the binary protocol
buffer in a textual format. This can be done using the tool `protoc`
which is part of `libprotobuf` and either installed via a system
package or built as part building the fuzzers. A full tutorial on
using protoc is beyond the scope of this document, see [protobuf
docs](https://protobuf.dev/getting-started/cpptutorial/) for more
details, but the basic invocation looks like this:
```bash
out/Fuzzer/protoc --decode tint.cmd.fuzz.ir.pb.Root ./src/tint/utils/protos/ir_fuzz/ir_fuzz.proto < ./input.tirb
```
The important thing to note is the use of `--decode
tint.cmd.fuzz.ir.pb.Root` to decode just the IR portion of the input,
but still needing to supply top-level `ir_fuzz.proto` as the format.
##### ir_fuzz_as
For generating binary test cases a rudimentary assembler,
`ir_fuzz_as`, has been implemented.
It is primarily used for converting the Tint test shaders into a seed
corpus for `tint_ir_fuzzer`, but can be used to convert user supplied
shaders into IR tests cases.
Similar to the disassembler it only operates on the shader/IR portion
of the test case format, and does not support embedding/manipulating
the binary blob portion. (There is currently no roundtrip workflow for
taking a test case binary, unpacking it, modifying it, and repacking
it with original blob, other then manually unpacking and packing
protobufs, which is left as an exercise to the reader).
It also suffers from the same limitation of the disassembler with
regards to not handling malformed inputs particularly well, since it
depends on the general Tint CLI parsing/printing infrastructure.
It is capable of dumping raw text of the IR protobuf that it
generates, which can be useful for understanding how the IR is being
encoded/decoded, since you can take a snippet of valid WGSL and dump
out what the protobuf looks like for it.
## Debugging
To debug a specific registered fuzzer function, one strategy is to add
a `TINT_ICE` call at the top of the function, and then run the fuzzer
with `-filter <name>` to have it only run that specific fuzzer. When
the function is called, the libfuzzer harness will emit a crash file
that can be used as input on subsequent runs. Remove the `TINT_ICE`
and run the fuzzer again using this crash file.
For example, if we wish to debug `tint::msl::writer::IRFuzzer`, we
would first insert a `TINT_ICE` at the top:
```c++
Result<SuccessType> IRFuzzer(core::ir::Module& module,
const fuzz::ir::Context& context,
Options options) {
TINT_ICE() << "Crash";
// Comment out the rest of the body to avoid unreachable code warnings
}
```
Build and run the fuzzer, filtering in this function:
```bash
autoninja -C out/libfuzz tint_wgsl_fuzzer
out/libfuzz/tint_wgsl_fuzzer -filter=tint::msl::writer::IRFuzzer
```
It can take a little while before libfuzzer generates a valid input
WGSL, but eventually it will call into the function and crash on the
ICE:
```
...
#71607 NEW cov: 3633 ft: 8316 corp: 1045/8248b lim: 25 exec/s: 2469 rss: 158Mb L: 8/25 MS: 2 ShuffleBytes-PersAutoDict- DE: "true"-
#71669 NEW cov: 3633 ft: 8317 corp: 1046/8266b lim: 25 exec/s: 2471 rss: 158Mb L: 18/25 MS: 2 CMP-ChangeByte- DE: "if"-
#71697 REDUCE cov: 3633 ft: 8317 corp: 1046/8264b lim: 25 exec/s: 2472 rss: 158Mb L: 6/25 MS: 3 PersAutoDict-ChangeBit-EraseBytes- DE: "\001\002"-
ICE while running fuzzer: 'tint::msl::writer::IRFuzzer'
..\..\src\tint\lang\msl\writer\writer_fuzz.cc:63 internal compiler error: Crash
==25204== ERROR: libFuzzer: deadly signal
NOTE: libFuzzer has rudimentary signal handlers.
Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 1 PersAutoDict- DE: "or"-; base unit: b34d87c378ebbbfbfd475303dcc75d1d1b2a7c7a
0x2f,0x2f,0x33,0x33,0x33,0x33,0x6f,0x72,0x33,0x33,0x33,0x33,0x33,0x33,0x33,0x33,0x33,0x33,0x2a,0x2a,0x30,0x32,
//3333or3333333333**02
artifact_prefix='./'; Test unit written to ./crash-21563a85afd5322d9e17c1c43fd3d4029778d6e7
Base64: Ly8zMzMzb3IzMzMzMzMzMzMzKiowMg==
```
Note that the second to last line specifies that the input test was
written to a file. Now we can remove the `TINT_ICE` and run the fuzzer
with just this file as input:
```bash
out/libfuzz/tint_wgsl_fuzzer ./crash-21563a85afd5322d9e17c1c43fd3d4029778d6e7
```