Add translation scheme for HLSL-style input/outputs Add design doc describing how to translate from SPIR-V Vulkan style inputs/outputs to WGSL-style inputs as shader entry points, and outputs as shader return value. Change-Id: If0dc5f07542ccf2db95ed648a34f73a06e20a5a4 Reviewed-on: https://dawn-review.googlesource.com/c/tint/+/38961 Reviewed-by: Alan Baker <alanbaker@google.com> Commit-Queue: David Neto <dneto@google.com>

diff --git a/docs/spirv-input-output-variables.md b/docs/spirv-input-output-variables.md
new file mode 100644
index 0000000..4814dda
--- /dev/null
+++ b/docs/spirv-input-output-variables.md

@@ -0,0 +1,264 @@
+# SPIR-V translation of shader input and output variables
+
+WGSL [MR 1315](https://github.com/gpuweb/gpuweb/issues/1315) changed WGSL so
+that pipeline inputs and outputs are handled similar to HLSL:
+
+- Shader pipeline inputs are the WGSL entry point function arguments.
+- Shader pipeline outputs are the WGSL entry point return value.
+
+Note: In both cases, a struct may be used to pack multiple values together.
+In that case, I/O specific attributes appear on struct members at the struct declaration.
+
+Resource variables, e.g. buffers, samplers, and textures, are still declared
+as variables at module scope.
+
+## Vulkan SPIR-V today
+
+SPIR-V for Vulkan models inputs and outputs as module-scope variables in
+the Input and Output storage classes, respectively.
+
+The `OpEntryPoint` instruction has a list of module-scope variables that must
+be a superset of all the input and output variables that are statically
+accessed in the shader call tree.
+From SPIR-V 1.4 onward, all interface variables that might be statically accessed
+must appear on that list.
+So that includes all resource variables that might be statically accessed
+by the shader call tree.
+
+## Translation scheme for SPIR-V to WGSL
+
+A translation scheme from SPIR-V to WGSL is as follows:
+
+Each SPIR-V entry point maps to a set of Private variables proxying the
+inputs and outputs, and two functions:
+
+- An inner function with no arguments or return values, and whose body
+  is the same as the original SPIR-V entry point.
+- Original input variables are mapped to pseudo-in Private variables
+  with the same store types, but no other attributes or properties copied.
+- Original output variables are mapped to pseudo-out Private variables
+  with the same store types, but no other attributes or properties are copied.
+- A wrapper entry point function whose arguments correspond in type, location
+  and builtin attributes the original input variables, and whose return type is
+  a structure containing members correspond in type, location, and builtin
+  attributes to the original output variables.
+  The body of the wrapper function the following phases:
+  - Copy formal parameter values into pseudo-in variables.
+  - Use stores to initialize pseudo-out variables:
+    - If the original variable had an initializer, store that value.
+    - Otherwise, store a zero value for the store type.
+  - Execute the inner function.
+  - Copy pseudo-out variables into the return structure.
+  - Return the return structure.
+
+- Replace uses of the the original input/output variables to the pseudo-in and
+  pseudo-out variables, respectively.
+- Remap pointer-to-Input with pointer-to-Private
+- Remap pointer-to-Output with pointer-to-Private
+
+We are not concerned with the cost of extra copying input/output values.
+First, the pipeline inputs/outputs tend to be small.
+Second, we expect the backend compiler in the driver will be able to see
+through the copying and optimize the result.
+
+### Example
+
+
+```glsl
+    #version 450
+
+    layout(location = 0) out vec4 frag_colour;
+    layout(location = 0) in vec4 the_colour;
+
+    void bar() {
+      frag_colour = the_colour;
+    }
+
+    void main() {
+        bar();
+    }
+```
+
+Current translation, through SPIR-V, SPIR-V reader, WGSL writer:
+
+```groovy
+    [[location(0)]] var<out> frag_colour : vec4<f32>;
+    [[location(0)]] var<in> the_colour : vec4<f32>;
+
+    fn bar_() -> void {
+      const x_14 : vec4<f32> = the_colour;
+      frag_colour = x_14;
+      return;
+    }
+
+    [[stage(fragment)]]
+    fn main() -> void {
+      bar_();
+      return;
+    }
+```
+
+Proposed translation, through SPIR-V, SPIR-V reader, WGSL writer:
+
+```groovy
+    // 'in' variables are now 'private'.
+    var<private> frag_colour : vec4<f32>;
+    var<private> the_colour : vec4<f32>;
+
+    fn bar_() -> void {
+      // Accesses to the module-scope variables do not change.
+      // This is a big simplifying advantage.
+      const x_14 : vec4<f32> = the_colour;
+      frag_colour = x_14;
+      return;
+    }
+
+    fn main_inner() -> void {
+      bar_();
+      return;
+    }
+
+    // Declare a structure type to collect the return values.
+    struct main_result_type {
+      [[location(0)]] frag_color : vec4<f32>;
+    };
+
+    [[stage(fragment)]]
+    fn main(
+
+      // 'in' variables are entry point parameters
+      [[location(0)]] the_color_arg : vec4<f32>
+
+    ) -> main_result_type {
+
+      // Save 'in' arguments to 'private' variables.
+      the_color = the_color_arg;
+
+      // Initialize 'out' variables.
+      // Use the zero value, since no initializer was specified.
+      frag_color = vec4<f32>();
+
+      // Invoke the original entry point.
+      main_inner();
+
+      // Collect outputs into a structure and return it.
+      var result : main_outer_result_type;
+      result.frag_color = frag_color;
+      return result;
+    }
+```
+
+Alternately, we could emit the body of the original entry point at
+the point of invocation.
+However that is more complex because the original entry point function
+may return from multiple locations, and we would like to have only
+a single exit path to construct and return the result value.
+
+### Handling fragment discard
+
+In SPIR-V `OpKill` causes immediate termination of the shader.
+Is the shader obligated to write its outputs when `OpKill` is executed?
+
+The Vulkan fragment operations are as follows:
+(see [6. Fragment operations](https://www.khronos.org/registry/vulkan/specs/1.2/html/vkspec.html#fragops)).
+
+* Scissor test
+* Sample mask test
+* Fragment shading
+* Multisample coverage
+* Depth bounds test
+* Stencil test
+* Depth test
+* Sample counting
+* Coverage reduction
+
+After that, the fragment results are used to update output attachments, including
+colour, depth, and stencil attachments.
+
+Vulkan says:
+
+> If a fragment operation results in all bits of the coverage mask being 0,
+> the fragment is discarded, and no further operations are performed.
+> Fragments can also be programmatically discarded in a fragment shader by executing one of
+>
+>     OpKill.
+
+I interpret this to mean that the outputs of a discarded fragment are ignored.
+
+Therefore, `OpKill` does not require us to modify the basic scheme from the previous
+section.
+
+The `OpDemoteToHelperInvocationEXT`
+instruction is an alternative way to throw away a fragment, but which
+does not immediately terminate execution of the invocation.
+It is introduced in the [`SPV_EXT_demote_to_helper_invocation](http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/EXT/SPV_EXT_demote_to_helper_invocation.html)
+extension.  WGSL does not have this feature, but we expect it will be introduced by a
+future WGSL extension.  The same analysis applies to demote-to-helper.  When introduced,
+it will not affect translation of pipeline outputs.
+
+### Handling depth-replacing mode
+
+A Vulkan fragment shader must write to the fragment depth builtin if and only if it
+has a `DepthReplacing` execution mode. Otherwise behaviour is undefined.
+
+We will ignore the case where the SPIR-V shader writes to the `FragDepth` builtin
+and then discards the fragment.
+This is justified because "no further operations" are performed by the pipeline
+after the fragment is discarded, and that includes writing to depth output attachments.
+
+Assuming the shader is valid, no special translation is required.
+
+### Handling output sample mask
+
+By the same reasoning as for depth-replacing, it is ok to incidentally not write
+to the sample-mask builtin variable when the fragment is discarded.
+
+### Handling clip distance and cull distance
+
+Most builtin variables are scalars or vectors.
+However, the `ClipDistance` and `CullDistance` builtin variables are arrays of 32-bit float values.
+Each entry defines a clip half-plane (respectively cull half-plane)
+A Vulkan implementation must support array sizes of up to 8 elements.
+
+How prevalent are shaders that use these features?
+These variables are supported when Vulkan features `shaderClipDistance` and `shaderCullDistance`
+are supported.
+According to gpuinfo.org as of this writing, those
+Vulkan features appear to be nearly universally supported on Windows devices (>99%),
+but by only 70% on Android.
+It appears that Qualcomm devices support them, but Mali devices do not (e.g. Mali-G77).
+
+The proposed translation scheme forces a copy of each array from private
+variables into the return value of a vertex shader, or into a private
+variable of a fragment shader.
+In addition to the register pressure, there may be a performance degradation
+due to the bulk copying of data.
+
+We think this is an acceptable tradeoff for the gain in usability and
+consistency with other pipeline inputs and outputs.
+
+## Translation scheme for WGSL AST to SPIR-V
+
+To translate from the WGSL AST to SPIR-V, do the following:
+
+- Each entry point formal parameter is mapped to a SPIR-V `Input` variable.
+  - Struct and array inputs may have to be broken down into individual variables.
+- The return of the entry point is broken down into fields, with one
+  `Output` variable per field.
+- In the above, builtins must be separated from user attributes.
+  - Builtin attributes are moved to the corresponding variable.
+  - Location and interpolation attributes are moved to the corresponding
+    variables.
+- This translation relies on the fact that pipeline inputs and pipeline
+  outputs are IO-shareable types. IO-shareable types are always storable,
+  and can be the store type of input/output variables.
+- Input function parameters will be automatically initialized by the system
+  as part of setting up the pipeline inputs to the entry point.
+- Replace each return statement in the entry point with a code sequence
+  which writes the return value components to the synthesized output variables,
+  and then executes an `OpReturn` (without value).
+
+This translation is sufficient even for fragment shaders with discard.
+In that case, outputs will be ignored because downstream pipeline
+operations will not be performed.
+This is the same rationale as for translation from SPIR-V to WGSL AST.