Add translation scheme for HLSL-style input/outputs Add design doc describing how to translate from SPIR-V Vulkan style inputs/outputs to WGSL-style inputs as shader entry points, and outputs as shader return value. Change-Id: If0dc5f07542ccf2db95ed648a34f73a06e20a5a4 Reviewed-on: https://dawn-review.googlesource.com/c/tint/+/38961 Reviewed-by: Alan Baker <alanbaker@google.com> Commit-Queue: David Neto <dneto@google.com>
diff --git a/docs/spirv-input-output-variables.md b/docs/spirv-input-output-variables.md new file mode 100644 index 0000000..4814dda --- /dev/null +++ b/docs/spirv-input-output-variables.md
@@ -0,0 +1,264 @@ +# SPIR-V translation of shader input and output variables + +WGSL [MR 1315](https://github.com/gpuweb/gpuweb/issues/1315) changed WGSL so +that pipeline inputs and outputs are handled similar to HLSL: + +- Shader pipeline inputs are the WGSL entry point function arguments. +- Shader pipeline outputs are the WGSL entry point return value. + +Note: In both cases, a struct may be used to pack multiple values together. +In that case, I/O specific attributes appear on struct members at the struct declaration. + +Resource variables, e.g. buffers, samplers, and textures, are still declared +as variables at module scope. + +## Vulkan SPIR-V today + +SPIR-V for Vulkan models inputs and outputs as module-scope variables in +the Input and Output storage classes, respectively. + +The `OpEntryPoint` instruction has a list of module-scope variables that must +be a superset of all the input and output variables that are statically +accessed in the shader call tree. +From SPIR-V 1.4 onward, all interface variables that might be statically accessed +must appear on that list. +So that includes all resource variables that might be statically accessed +by the shader call tree. + +## Translation scheme for SPIR-V to WGSL + +A translation scheme from SPIR-V to WGSL is as follows: + +Each SPIR-V entry point maps to a set of Private variables proxying the +inputs and outputs, and two functions: + +- An inner function with no arguments or return values, and whose body + is the same as the original SPIR-V entry point. +- Original input variables are mapped to pseudo-in Private variables + with the same store types, but no other attributes or properties copied. +- Original output variables are mapped to pseudo-out Private variables + with the same store types, but no other attributes or properties are copied. +- A wrapper entry point function whose arguments correspond in type, location + and builtin attributes the original input variables, and whose return type is + a structure containing members correspond in type, location, and builtin + attributes to the original output variables. + The body of the wrapper function the following phases: + - Copy formal parameter values into pseudo-in variables. + - Use stores to initialize pseudo-out variables: + - If the original variable had an initializer, store that value. + - Otherwise, store a zero value for the store type. + - Execute the inner function. + - Copy pseudo-out variables into the return structure. + - Return the return structure. + +- Replace uses of the the original input/output variables to the pseudo-in and + pseudo-out variables, respectively. +- Remap pointer-to-Input with pointer-to-Private +- Remap pointer-to-Output with pointer-to-Private + +We are not concerned with the cost of extra copying input/output values. +First, the pipeline inputs/outputs tend to be small. +Second, we expect the backend compiler in the driver will be able to see +through the copying and optimize the result. + +### Example + + +```glsl + #version 450 + + layout(location = 0) out vec4 frag_colour; + layout(location = 0) in vec4 the_colour; + + void bar() { + frag_colour = the_colour; + } + + void main() { + bar(); + } +``` + +Current translation, through SPIR-V, SPIR-V reader, WGSL writer: + +```groovy + [[location(0)]] var<out> frag_colour : vec4<f32>; + [[location(0)]] var<in> the_colour : vec4<f32>; + + fn bar_() -> void { + const x_14 : vec4<f32> = the_colour; + frag_colour = x_14; + return; + } + + [[stage(fragment)]] + fn main() -> void { + bar_(); + return; + } +``` + +Proposed translation, through SPIR-V, SPIR-V reader, WGSL writer: + +```groovy + // 'in' variables are now 'private'. + var<private> frag_colour : vec4<f32>; + var<private> the_colour : vec4<f32>; + + fn bar_() -> void { + // Accesses to the module-scope variables do not change. + // This is a big simplifying advantage. + const x_14 : vec4<f32> = the_colour; + frag_colour = x_14; + return; + } + + fn main_inner() -> void { + bar_(); + return; + } + + // Declare a structure type to collect the return values. + struct main_result_type { + [[location(0)]] frag_color : vec4<f32>; + }; + + [[stage(fragment)]] + fn main( + + // 'in' variables are entry point parameters + [[location(0)]] the_color_arg : vec4<f32> + + ) -> main_result_type { + + // Save 'in' arguments to 'private' variables. + the_color = the_color_arg; + + // Initialize 'out' variables. + // Use the zero value, since no initializer was specified. + frag_color = vec4<f32>(); + + // Invoke the original entry point. + main_inner(); + + // Collect outputs into a structure and return it. + var result : main_outer_result_type; + result.frag_color = frag_color; + return result; + } +``` + +Alternately, we could emit the body of the original entry point at +the point of invocation. +However that is more complex because the original entry point function +may return from multiple locations, and we would like to have only +a single exit path to construct and return the result value. + +### Handling fragment discard + +In SPIR-V `OpKill` causes immediate termination of the shader. +Is the shader obligated to write its outputs when `OpKill` is executed? + +The Vulkan fragment operations are as follows: +(see [6. Fragment operations](https://www.khronos.org/registry/vulkan/specs/1.2/html/vkspec.html#fragops)). + +* Scissor test +* Sample mask test +* Fragment shading +* Multisample coverage +* Depth bounds test +* Stencil test +* Depth test +* Sample counting +* Coverage reduction + +After that, the fragment results are used to update output attachments, including +colour, depth, and stencil attachments. + +Vulkan says: + +> If a fragment operation results in all bits of the coverage mask being 0, +> the fragment is discarded, and no further operations are performed. +> Fragments can also be programmatically discarded in a fragment shader by executing one of +> +> OpKill. + +I interpret this to mean that the outputs of a discarded fragment are ignored. + +Therefore, `OpKill` does not require us to modify the basic scheme from the previous +section. + +The `OpDemoteToHelperInvocationEXT` +instruction is an alternative way to throw away a fragment, but which +does not immediately terminate execution of the invocation. +It is introduced in the [`SPV_EXT_demote_to_helper_invocation](http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/EXT/SPV_EXT_demote_to_helper_invocation.html) +extension. WGSL does not have this feature, but we expect it will be introduced by a +future WGSL extension. The same analysis applies to demote-to-helper. When introduced, +it will not affect translation of pipeline outputs. + +### Handling depth-replacing mode + +A Vulkan fragment shader must write to the fragment depth builtin if and only if it +has a `DepthReplacing` execution mode. Otherwise behaviour is undefined. + +We will ignore the case where the SPIR-V shader writes to the `FragDepth` builtin +and then discards the fragment. +This is justified because "no further operations" are performed by the pipeline +after the fragment is discarded, and that includes writing to depth output attachments. + +Assuming the shader is valid, no special translation is required. + +### Handling output sample mask + +By the same reasoning as for depth-replacing, it is ok to incidentally not write +to the sample-mask builtin variable when the fragment is discarded. + +### Handling clip distance and cull distance + +Most builtin variables are scalars or vectors. +However, the `ClipDistance` and `CullDistance` builtin variables are arrays of 32-bit float values. +Each entry defines a clip half-plane (respectively cull half-plane) +A Vulkan implementation must support array sizes of up to 8 elements. + +How prevalent are shaders that use these features? +These variables are supported when Vulkan features `shaderClipDistance` and `shaderCullDistance` +are supported. +According to gpuinfo.org as of this writing, those +Vulkan features appear to be nearly universally supported on Windows devices (>99%), +but by only 70% on Android. +It appears that Qualcomm devices support them, but Mali devices do not (e.g. Mali-G77). + +The proposed translation scheme forces a copy of each array from private +variables into the return value of a vertex shader, or into a private +variable of a fragment shader. +In addition to the register pressure, there may be a performance degradation +due to the bulk copying of data. + +We think this is an acceptable tradeoff for the gain in usability and +consistency with other pipeline inputs and outputs. + +## Translation scheme for WGSL AST to SPIR-V + +To translate from the WGSL AST to SPIR-V, do the following: + +- Each entry point formal parameter is mapped to a SPIR-V `Input` variable. + - Struct and array inputs may have to be broken down into individual variables. +- The return of the entry point is broken down into fields, with one + `Output` variable per field. +- In the above, builtins must be separated from user attributes. + - Builtin attributes are moved to the corresponding variable. + - Location and interpolation attributes are moved to the corresponding + variables. +- This translation relies on the fact that pipeline inputs and pipeline + outputs are IO-shareable types. IO-shareable types are always storable, + and can be the store type of input/output variables. +- Input function parameters will be automatically initialized by the system + as part of setting up the pipeline inputs to the entry point. +- Replace each return statement in the entry point with a code sequence + which writes the return value components to the synthesized output variables, + and then executes an `OpReturn` (without value). + +This translation is sufficient even for fragment shaders with discard. +In that case, outputs will be ignored because downstream pipeline +operations will not be performed. +This is the same rationale as for translation from SPIR-V to WGSL AST.