Start adding docs about device facilities.

Bug: dawn:373
Change-Id: I837b0fe15ff98d58caf6b69ea6d8d92bee33e52e
Reviewed-on: https://dawn-review.googlesource.com/c/dawn/+/18762
Commit-Queue: Corentin Wallez <cwallez@chromium.org>
Reviewed-by: Kai Ninomiya <kainino@chromium.org>
diff --git a/docs/device_facilities.md b/docs/device_facilities.md
new file mode 100644
index 0000000..ae75323
--- /dev/null
+++ b/docs/device_facilities.md
@@ -0,0 +1,106 @@
+# Devices
+
+In Dawn the `Device` is a "god object" that contains a lot of facilities useful for the whole object graph that descends from it.
+There a number of facilities common to all backends that live in the frontend and backend-specific facilities.
+Example of frontend facilities are the management of content-less object caches, or the toggle management.
+Example of backend facilities are GPU memory allocators or the backing API function pointer table.
+
+## Frontend facilities
+
+### Error Handling
+
+Dawn (dawn_native) uses the [Error.h](../src/dawn_native/Error.h) error handling to robustly handle errors.
+With `DAWN_TRY` errors bubble up all the way to, and are "consumed" by the entry-point that was called by the application.
+Error consumption uses `Device::ConsumeError` that expose them via the WebGPU "error scopes" and can also influence the device lifecycle by notifying of a device loss, or triggering a device loss..
+
+See [Error.h](../src/dawn_native/Error.h) for more information about using errors.
+
+### Device Lifecycle
+
+The device lifecycle is a bit more complicated than other objects in Dawn for multiple reasons:
+
+ - The device initialization creates facilities in both the backend and the frontend, which can fail.
+   When a device fails to initialize, it should still be possible to destroy it without crashing.
+ - Execution of commands on the GPU must be finished before the device can be destroyed (because there's noone to "DeleteWhenUnused" the device).
+ - On creation a device might want to run some GPU commands (like initializing zero-buffers), which must be completed before it is destroyed.
+ - A device can become "disconnected" when a TDR or hot-unplug happens.
+   In this case, destruction of the device doesn't need to wait on GPU commands to finish because they just disappeared.
+
+There is a state machine `State` defined in [Device.h](../src/dawn_native/Device.h) that controls all of the above.
+The most common state is `Alive` when there are potentially GPU commands executing.
+
+Initialization of a device looks like the following:
+
+ - `DeviceBase::DeviceBase` is called and does mostly nothing except setting `State` to `BeingCreated` (and initial toggles).
+ - `backend::Device::Initialize` creates things like the underlying device and other stuff that doesn't run GPU commands.
+ - It then calls `DeviceBase::Initialize` that enables the `DeviceBase` facilities and sets the `State` to `Alive`.
+ - Optionally, `backend::Device::Initialize` can now enqueue GPU commands for its initialization.
+ - The device is ready to be used by the application!
+
+While it is `Alive` the device can notify it has been disconnected by the backend, in which case it jumps directly to the `Disconnected` state.
+Internal errors, or a call to `LoseForTesting` can also disconnect the device, but in the underlying API commands are still running, so the frontend will finish all commands (with `WaitForIdleForDesctruction`) and prevent any new commands to be enqueued (by setting state to `BeingDisconnected`).
+After this the device is set in the `Disconnected` state.
+If an `Alive` device is destroyed, then a similar flow to `LoseForTesting happens`.
+
+All this ensures that during destruction or forceful disconnect of the device, it properly gets to the `Disconnected` state with no commands executing on the GPU.
+After disconnecting, frontend will call `backend::Device::ShutDownImpl` so that it can properly free driver objects.
+
+### Toggles
+
+Toggles are booleans that control code paths inside of Dawn, like lazy-clearing resources or using D3D12 render passes.
+They aren't just booleans close to the code path they control, because embedders of Dawn like Chromium want to be able to surface what toggles are used by a device (like in about:gpu).
+
+Toogles are to be used for any optional code path in Dawn, including:
+
+ - Workarounds for driver bugs.
+ - Disabling select parts of the validation or robustness.
+ - Enabling limitations that help with testing.
+ - Using more advanced or optional backend API features.
+
+Toggles can be queried using `DeviceBase::IsToggleEnabled`:
+```
+bool useRenderPass = device->IsToggleEnabled(Toggle::UseD3D12RenderPass);
+```
+
+Toggles are defined in a table in [Toggles.cpp](../src/dawn_native/Toggles.cpp) that also includes their name and description.
+The name can be used to force enabling of a toggle or, at the contrary, force the disabling of a toogle.
+This is particularly useful in tests so that the two sides of a code path can be tested (for example using D3D12 render passes and not).
+
+Here's an example of a test that is run in the D3D12 backend both with the D3D12 render passes forcibly disabled, and in the default configuration.
+```
+DAWN_INSTANTIATE_TEST(RenderPassTest,
+                      D3D12Backend(),
+                      D3D12Backend({}, {"use_d3d12_render_pass"}));
+// The {} is the list of force enabled toggles, {"..."} the force disabled ones.
+```
+
+The initialization order of toggles looks as follows:
+
+ - The toggles overrides from the device descriptor are applied.
+ - The frontend device default toggles are applied (unless already overriden).
+ - The backend device default toggles are applied (unless already overriden) using `DeviceBase::SetToggle`
+ - The backend device can ignore overriden toggles if it can't support them by using `DeviceBase::ForceSetToggle`
+
+Forcing toggles should only be done when there is no "safe" option for the toggle.
+This is to avoid crashes during testing when the tests try to use both sides of a toggle.
+For toggles that are safe to enable, like workarounds, the tests can run against the base configuration and with the toggle enabled.
+For toggles that are safe to disable, like using more advanced backing API features, the tests can run against the base configuation and with the toggle disabled.
+
+### Immutable object caches
+
+A number of WebGPU objects are immutable once created, and can be expensive to create, like pipelines.
+`DeviceBase` contains caches for these objects so that they are free to create the second time.
+This is also useful to be able to compare objects by pointers like `BindGroupLayouts` since two BGLs would be equal iff they are the same object.
+
+### Format Tables
+
+The frontend has a `Format` structure that represent all the information that are known about a particular WebGPU format for this Device based on the enabled extensions.
+Formats are precomputed at device initialization and can be queried from a WebGPU format either assuming the format is a valid enum, or in a safe manner that doesn't do this assumption.
+A reference to these formats can be stored persistently as they have the same lifetime as the `Device`.
+
+Formats also have an "index" so that backends can create parallel tables for internal informations about formats, like what they translate to in the backing API.
+
+### Object factory
+
+Like WebGPU's device object, `DeviceBase` is an factory with methods to create all kinds of other WebGPU objects.
+WebGPU has some objects that aren't created from the device, like the texture view, but in Dawn these creations also go through `DeviceBase` so that there is a single factory for each backend.