Start adding docs about device facilities. Bug: dawn:373 Change-Id: I837b0fe15ff98d58caf6b69ea6d8d92bee33e52e Reviewed-on: https://dawn-review.googlesource.com/c/dawn/+/18762 Commit-Queue: Corentin Wallez <cwallez@chromium.org> Reviewed-by: Kai Ninomiya <kainino@chromium.org>
diff --git a/docs/device_facilities.md b/docs/device_facilities.md new file mode 100644 index 0000000..ae75323 --- /dev/null +++ b/docs/device_facilities.md
@@ -0,0 +1,106 @@ +# Devices + +In Dawn the `Device` is a "god object" that contains a lot of facilities useful for the whole object graph that descends from it. +There a number of facilities common to all backends that live in the frontend and backend-specific facilities. +Example of frontend facilities are the management of content-less object caches, or the toggle management. +Example of backend facilities are GPU memory allocators or the backing API function pointer table. + +## Frontend facilities + +### Error Handling + +Dawn (dawn_native) uses the [Error.h](../src/dawn_native/Error.h) error handling to robustly handle errors. +With `DAWN_TRY` errors bubble up all the way to, and are "consumed" by the entry-point that was called by the application. +Error consumption uses `Device::ConsumeError` that expose them via the WebGPU "error scopes" and can also influence the device lifecycle by notifying of a device loss, or triggering a device loss.. + +See [Error.h](../src/dawn_native/Error.h) for more information about using errors. + +### Device Lifecycle + +The device lifecycle is a bit more complicated than other objects in Dawn for multiple reasons: + + - The device initialization creates facilities in both the backend and the frontend, which can fail. + When a device fails to initialize, it should still be possible to destroy it without crashing. + - Execution of commands on the GPU must be finished before the device can be destroyed (because there's noone to "DeleteWhenUnused" the device). + - On creation a device might want to run some GPU commands (like initializing zero-buffers), which must be completed before it is destroyed. + - A device can become "disconnected" when a TDR or hot-unplug happens. + In this case, destruction of the device doesn't need to wait on GPU commands to finish because they just disappeared. + +There is a state machine `State` defined in [Device.h](../src/dawn_native/Device.h) that controls all of the above. +The most common state is `Alive` when there are potentially GPU commands executing. + +Initialization of a device looks like the following: + + - `DeviceBase::DeviceBase` is called and does mostly nothing except setting `State` to `BeingCreated` (and initial toggles). + - `backend::Device::Initialize` creates things like the underlying device and other stuff that doesn't run GPU commands. + - It then calls `DeviceBase::Initialize` that enables the `DeviceBase` facilities and sets the `State` to `Alive`. + - Optionally, `backend::Device::Initialize` can now enqueue GPU commands for its initialization. + - The device is ready to be used by the application! + +While it is `Alive` the device can notify it has been disconnected by the backend, in which case it jumps directly to the `Disconnected` state. +Internal errors, or a call to `LoseForTesting` can also disconnect the device, but in the underlying API commands are still running, so the frontend will finish all commands (with `WaitForIdleForDesctruction`) and prevent any new commands to be enqueued (by setting state to `BeingDisconnected`). +After this the device is set in the `Disconnected` state. +If an `Alive` device is destroyed, then a similar flow to `LoseForTesting happens`. + +All this ensures that during destruction or forceful disconnect of the device, it properly gets to the `Disconnected` state with no commands executing on the GPU. +After disconnecting, frontend will call `backend::Device::ShutDownImpl` so that it can properly free driver objects. + +### Toggles + +Toggles are booleans that control code paths inside of Dawn, like lazy-clearing resources or using D3D12 render passes. +They aren't just booleans close to the code path they control, because embedders of Dawn like Chromium want to be able to surface what toggles are used by a device (like in about:gpu). + +Toogles are to be used for any optional code path in Dawn, including: + + - Workarounds for driver bugs. + - Disabling select parts of the validation or robustness. + - Enabling limitations that help with testing. + - Using more advanced or optional backend API features. + +Toggles can be queried using `DeviceBase::IsToggleEnabled`: +``` +bool useRenderPass = device->IsToggleEnabled(Toggle::UseD3D12RenderPass); +``` + +Toggles are defined in a table in [Toggles.cpp](../src/dawn_native/Toggles.cpp) that also includes their name and description. +The name can be used to force enabling of a toggle or, at the contrary, force the disabling of a toogle. +This is particularly useful in tests so that the two sides of a code path can be tested (for example using D3D12 render passes and not). + +Here's an example of a test that is run in the D3D12 backend both with the D3D12 render passes forcibly disabled, and in the default configuration. +``` +DAWN_INSTANTIATE_TEST(RenderPassTest, + D3D12Backend(), + D3D12Backend({}, {"use_d3d12_render_pass"})); +// The {} is the list of force enabled toggles, {"..."} the force disabled ones. +``` + +The initialization order of toggles looks as follows: + + - The toggles overrides from the device descriptor are applied. + - The frontend device default toggles are applied (unless already overriden). + - The backend device default toggles are applied (unless already overriden) using `DeviceBase::SetToggle` + - The backend device can ignore overriden toggles if it can't support them by using `DeviceBase::ForceSetToggle` + +Forcing toggles should only be done when there is no "safe" option for the toggle. +This is to avoid crashes during testing when the tests try to use both sides of a toggle. +For toggles that are safe to enable, like workarounds, the tests can run against the base configuration and with the toggle enabled. +For toggles that are safe to disable, like using more advanced backing API features, the tests can run against the base configuation and with the toggle disabled. + +### Immutable object caches + +A number of WebGPU objects are immutable once created, and can be expensive to create, like pipelines. +`DeviceBase` contains caches for these objects so that they are free to create the second time. +This is also useful to be able to compare objects by pointers like `BindGroupLayouts` since two BGLs would be equal iff they are the same object. + +### Format Tables + +The frontend has a `Format` structure that represent all the information that are known about a particular WebGPU format for this Device based on the enabled extensions. +Formats are precomputed at device initialization and can be queried from a WebGPU format either assuming the format is a valid enum, or in a safe manner that doesn't do this assumption. +A reference to these formats can be stored persistently as they have the same lifetime as the `Device`. + +Formats also have an "index" so that backends can create parallel tables for internal informations about formats, like what they translate to in the backing API. + +### Object factory + +Like WebGPU's device object, `DeviceBase` is an factory with methods to create all kinds of other WebGPU objects. +WebGPU has some objects that aren't created from the device, like the texture view, but in Dawn these creations also go through `DeviceBase` so that there is a single factory for each backend.