docs/tint/spirv-input-output-variables.md - dawn - Git at Google

 # SPIR-V translation of shader input and output variables

 WGSL [MR 1315](https://github.com/gpuweb/gpuweb/issues/1315) changed WGSL so
 that pipeline inputs and outputs are handled similar to HLSL:

 - Shader pipeline inputs are the WGSL entry point function arguments.
 - Shader pipeline outputs are the WGSL entry point return value.

 Note: In both cases, a struct may be used to pack multiple values together.
 In that case, I/O specific attributes appear on struct members at the struct declaration.

 Resource variables, e.g. buffers, samplers, and textures, are still declared
 as variables at module scope.

 ## Vulkan SPIR-V today

 SPIR-V for Vulkan models inputs and outputs as module-scope variables in
 the Input and Output storage classes, respectively.

 The `OpEntryPoint` instruction has a list of module-scope variables that must
 be a superset of all the input and output variables that are statically
 accessed in the shader call tree.
 From SPIR-V 1.4 onward, all interface variables that might be statically accessed
 must appear on that list.
 So that includes all resource variables that might be statically accessed
 by the shader call tree.

 ## Translation scheme for SPIR-V to WGSL

 A translation scheme from SPIR-V to WGSL is as follows:

 Each SPIR-V entry point maps to a set of Private variables proxying the
 inputs and outputs, and two functions:

 - An inner function with no arguments or return values, and whose body
   is the same as the original SPIR-V entry point.
 - Original input variables are mapped to pseudo-in Private variables
   with the same store types, but no other attributes or properties copied.
   In Vulkan, Input variables don't have initalizers.
 - Original output variables are mapped to pseudo-out Private variables
   with the same store types and optional initializer, but no other attributes
   or properties are copied.
 - A wrapper entry point function whose arguments correspond in type, location
   and builtin attributes the original input variables, and whose return type is
   a structure containing members correspond in type, location, and builtin
   attributes to the original output variables.
   The body of the wrapper function the following phases:
   - Copy formal parameter values into pseudo-in variables.
     - Insert a bitcast if the WGSL builtin variable has different signedness
       from the SPIR-V declared type.
   - Execute the inner function.
   - Copy pseudo-out variables into the return structure.
     - Insert a bitcast if the WGSL builtin variable has different signedness
       from the SPIR-V declared type.
   - Return the return structure.

 - Replace uses of the the original input/output variables to the pseudo-in and
   pseudo-out variables, respectively.
 - Remap pointer-to-Input with pointer-to-Private
 - Remap pointer-to-Output with pointer-to-Private

 We are not concerned with the cost of extra copying input/output values.
 First, the pipeline inputs/outputs tend to be small.
 Second, we expect the backend compiler in the driver will be able to see
 through the copying and optimize the result.

 ### Example


 ```glsl
     #version 450

     layout(location = 0) out vec4 frag_colour;
     layout(location = 0) in vec4 the_colour;

     void bar() {
       frag_colour = the_colour;
     }

     void main() {
         bar();
     }
 ```

 Current translation, through SPIR-V, SPIR-V reader, WGSL writer:

 ```groovy
     @location(0) var<out> frag_colour : vec4<f32>;
     @location(0) var<in> the_colour : vec4<f32>;

     fn bar_() -> void {
       const x_14 : vec4<f32> = the_colour;
       frag_colour = x_14;
       return;
     }

     @fragment
     fn main() -> void {
       bar_();
       return;
     }
 ```

 Proposed translation, through SPIR-V, SPIR-V reader, WGSL writer:

 ```groovy
     // 'in' variables are now 'private'.
     var<private> frag_colour : vec4<f32>;
     var<private> the_colour : vec4<f32>;

     fn bar_() -> void {
       // Accesses to the module-scope variables do not change.
       // This is a big simplifying advantage.
       const x_14 : vec4<f32> = the_colour;
       frag_colour = x_14;
       return;
     }

     fn main_inner() -> void {
       bar_();
       return;
     }

     // Declare a structure type to collect the return values.
     struct main_result_type {
       @location(0) frag_color : vec4<f32>;
     };

     @fragment
     fn main(

       // 'in' variables are entry point parameters
       @location(0) the_color_arg : vec4<f32>

     ) -> main_result_type {

       // Save 'in' arguments to 'private' variables.
       the_color = the_color_arg;

       // Initialize 'out' variables.
       // Use the zero value, since no initializer was specified.
       frag_color = vec4<f32>();

       // Invoke the original entry point.
       main_inner();

       // Collect outputs into a structure and return it.
       var result : main_outer_result_type;
       result.frag_color = frag_color;
       return result;
     }
 ```

 Alternately, we could emit the body of the original entry point at
 the point of invocation.
 However that is more complex because the original entry point function
 may return from multiple locations, and we would like to have only
 a single exit path to construct and return the result value.

 ### Handling fragment discard

 In SPIR-V `OpKill` causes immediate termination of the shader.
 Is the shader obligated to write its outputs when `OpKill` is executed?

 The Vulkan fragment operations are as follows:
 (see [6. Fragment operations](https://www.khronos.org/registry/vulkan/specs/1.2/html/vkspec.html#fragops)).

 * Scissor test
 * Sample mask test
 * Fragment shading
 * Multisample coverage
 * Depth bounds test
 * Stencil test
 * Depth test
 * Sample counting
 * Coverage reduction

 After that, the fragment results are used to update output attachments, including
 colour, depth, and stencil attachments.

 Vulkan says:

 > If a fragment operation results in all bits of the coverage mask being 0,
 > the fragment is discarded, and no further operations are performed.
 > Fragments can also be programmatically discarded in a fragment shader by executing one of
 >
 >     OpKill.

 I interpret this to mean that the outputs of a discarded fragment are ignored.

 Therefore, `OpKill` does not require us to modify the basic scheme from the previous
 section.

 The `OpDemoteToHelperInvocationEXT`
 instruction is an alternative way to throw away a fragment, but which
 does not immediately terminate execution of the invocation.
 It is introduced in the [`SPV_EXT_demote_to_helper_invocation](http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/EXT/SPV_EXT_demote_to_helper_invocation.html)
 extension.  WGSL does not have this feature, but we expect it will be introduced by a
 future WGSL extension.  The same analysis applies to demote-to-helper.  When introduced,
 it will not affect translation of pipeline outputs.

 ### Handling depth-replacing mode

 A Vulkan fragment shader must write to the fragment depth builtin if and only if it
 has a `DepthReplacing` execution mode. Otherwise behaviour is undefined.

 We will ignore the case where the SPIR-V shader writes to the `FragDepth` builtin
 and then discards the fragment.
 This is justified because "no further operations" are performed by the pipeline
 after the fragment is discarded, and that includes writing to depth output attachments.

 Assuming the shader is valid, no special translation is required.

 ### Handling output sample mask

 By the same reasoning as for depth-replacing, it is ok to incidentally not write
 to the sample-mask builtin variable when the fragment is discarded.

 ### Handling clip distance and cull distance

 Most builtin variables are scalars or vectors.
 However, the `ClipDistance` and `CullDistance` builtin variables are arrays of 32-bit float values.
 Each entry defines a clip half-plane (respectively cull half-plane)
 A Vulkan implementation must support array sizes of up to 8 elements.

 How prevalent are shaders that use these features?
 These variables are supported when Vulkan features `shaderClipDistance` and `shaderCullDistance`
 are supported.
 According to gpuinfo.org as of this writing, those
 Vulkan features appear to be nearly universally supported on Windows devices (>99%),
 but by only 70% on Android.
 It appears that Qualcomm devices support them, but Mali devices do not (e.g. Mali-G77).

 The proposed translation scheme forces a copy of each array from private
 variables into the return value of a vertex shader, or into a private
 variable of a fragment shader.
 In addition to the register pressure, there may be a performance degradation
 due to the bulk copying of data.

 We think this is an acceptable tradeoff for the gain in usability and
 consistency with other pipeline inputs and outputs.

 ## Translation scheme for WGSL AST to SPIR-V

 To translate from the WGSL AST to SPIR-V, do the following:

 - Each entry point formal parameter is mapped to a SPIR-V `Input` variable.
   - Struct and array inputs may have to be broken down into individual variables.
 - The return of the entry point is broken down into fields, with one
   `Output` variable per field.
 - In the above, builtins must be separated from user attributes.
   - Builtin attributes are moved to the corresponding variable.
   - Location and interpolation attributes are moved to the corresponding
     variables.
 - This translation relies on the fact that pipeline inputs and pipeline
   outputs are IO-shareable types. IO-shareable types are always storable,
   and can be the store type of input/output variables.
 - Input function parameters will be automatically initialized by the system
   as part of setting up the pipeline inputs to the entry point.
 - Replace each return statement in the entry point with a code sequence
   which writes the return value components to the synthesized output variables,
   and then executes an `OpReturn` (without value).

 This translation is sufficient even for fragment shaders with discard.
 In that case, outputs will be ignored because downstream pipeline
 operations will not be performed.
 This is the same rationale as for translation from SPIR-V to WGSL AST.
	# SPIR-V translation of shader input and output variables

	WGSL [MR 1315](https://github.com/gpuweb/gpuweb/issues/1315) changed WGSL so
	that pipeline inputs and outputs are handled similar to HLSL:

	- Shader pipeline inputs are the WGSL entry point function arguments.
	- Shader pipeline outputs are the WGSL entry point return value.

	Note: In both cases, a struct may be used to pack multiple values together.
	In that case, I/O specific attributes appear on struct members at the struct declaration.

	Resource variables, e.g. buffers, samplers, and textures, are still declared
	as variables at module scope.

	## Vulkan SPIR-V today

	SPIR-V for Vulkan models inputs and outputs as module-scope variables in
	the Input and Output storage classes, respectively.

	The `OpEntryPoint` instruction has a list of module-scope variables that must
	be a superset of all the input and output variables that are statically
	accessed in the shader call tree.
	From SPIR-V 1.4 onward, all interface variables that might be statically accessed
	must appear on that list.
	So that includes all resource variables that might be statically accessed
	by the shader call tree.

	## Translation scheme for SPIR-V to WGSL

	A translation scheme from SPIR-V to WGSL is as follows:

	Each SPIR-V entry point maps to a set of Private variables proxying the
	inputs and outputs, and two functions:

	- An inner function with no arguments or return values, and whose body
	is the same as the original SPIR-V entry point.
	- Original input variables are mapped to pseudo-in Private variables
	with the same store types, but no other attributes or properties copied.
	In Vulkan, Input variables don't have initalizers.
	- Original output variables are mapped to pseudo-out Private variables
	with the same store types and optional initializer, but no other attributes
	or properties are copied.
	- A wrapper entry point function whose arguments correspond in type, location
	and builtin attributes the original input variables, and whose return type is
	a structure containing members correspond in type, location, and builtin
	attributes to the original output variables.
	The body of the wrapper function the following phases:
	- Copy formal parameter values into pseudo-in variables.
	- Insert a bitcast if the WGSL builtin variable has different signedness
	from the SPIR-V declared type.
	- Execute the inner function.
	- Copy pseudo-out variables into the return structure.
	- Insert a bitcast if the WGSL builtin variable has different signedness
	from the SPIR-V declared type.
	- Return the return structure.

	- Replace uses of the the original input/output variables to the pseudo-in and
	pseudo-out variables, respectively.
	- Remap pointer-to-Input with pointer-to-Private
	- Remap pointer-to-Output with pointer-to-Private

	We are not concerned with the cost of extra copying input/output values.
	First, the pipeline inputs/outputs tend to be small.
	Second, we expect the backend compiler in the driver will be able to see
	through the copying and optimize the result.

	### Example


	```glsl
	#version 450

	layout(location = 0) out vec4 frag_colour;
	layout(location = 0) in vec4 the_colour;

	void bar() {
	frag_colour = the_colour;
	}

	void main() {
	bar();
	}
	```

	Current translation, through SPIR-V, SPIR-V reader, WGSL writer:

	```groovy
	@location(0) var<out> frag_colour : vec4<f32>;
	@location(0) var<in> the_colour : vec4<f32>;

	fn bar_() -> void {
	const x_14 : vec4<f32> = the_colour;
	frag_colour = x_14;
	return;
	}

	@fragment
	fn main() -> void {
	bar_();
	return;
	}
	```

	Proposed translation, through SPIR-V, SPIR-V reader, WGSL writer:

	```groovy
	// 'in' variables are now 'private'.
	var<private> frag_colour : vec4<f32>;
	var<private> the_colour : vec4<f32>;

	fn bar_() -> void {
	// Accesses to the module-scope variables do not change.
	// This is a big simplifying advantage.
	const x_14 : vec4<f32> = the_colour;
	frag_colour = x_14;
	return;
	}

	fn main_inner() -> void {
	bar_();
	return;
	}

	// Declare a structure type to collect the return values.
	struct main_result_type {
	@location(0) frag_color : vec4<f32>;
	};

	@fragment
	fn main(

	// 'in' variables are entry point parameters
	@location(0) the_color_arg : vec4<f32>

	) -> main_result_type {

	// Save 'in' arguments to 'private' variables.
	the_color = the_color_arg;

	// Initialize 'out' variables.
	// Use the zero value, since no initializer was specified.
	frag_color = vec4<f32>();

	// Invoke the original entry point.
	main_inner();

	// Collect outputs into a structure and return it.
	var result : main_outer_result_type;
	result.frag_color = frag_color;
	return result;
	}
	```

	Alternately, we could emit the body of the original entry point at
	the point of invocation.
	However that is more complex because the original entry point function
	may return from multiple locations, and we would like to have only
	a single exit path to construct and return the result value.

	### Handling fragment discard

	In SPIR-V `OpKill` causes immediate termination of the shader.
	Is the shader obligated to write its outputs when `OpKill` is executed?

	The Vulkan fragment operations are as follows:
	(see [6. Fragment operations](https://www.khronos.org/registry/vulkan/specs/1.2/html/vkspec.html#fragops)).

	* Scissor test
	* Sample mask test
	* Fragment shading
	* Multisample coverage
	* Depth bounds test
	* Stencil test
	* Depth test
	* Sample counting
	* Coverage reduction

	After that, the fragment results are used to update output attachments, including
	colour, depth, and stencil attachments.

	Vulkan says:

	> If a fragment operation results in all bits of the coverage mask being 0,
	> the fragment is discarded, and no further operations are performed.
	> Fragments can also be programmatically discarded in a fragment shader by executing one of
	>
	> OpKill.

	I interpret this to mean that the outputs of a discarded fragment are ignored.

	Therefore, `OpKill` does not require us to modify the basic scheme from the previous
	section.

	The `OpDemoteToHelperInvocationEXT`
	instruction is an alternative way to throw away a fragment, but which
	does not immediately terminate execution of the invocation.
	It is introduced in the [`SPV_EXT_demote_to_helper_invocation](http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/EXT/SPV_EXT_demote_to_helper_invocation.html)
	extension. WGSL does not have this feature, but we expect it will be introduced by a
	future WGSL extension. The same analysis applies to demote-to-helper. When introduced,
	it will not affect translation of pipeline outputs.

	### Handling depth-replacing mode

	A Vulkan fragment shader must write to the fragment depth builtin if and only if it
	has a `DepthReplacing` execution mode. Otherwise behaviour is undefined.

	We will ignore the case where the SPIR-V shader writes to the `FragDepth` builtin
	and then discards the fragment.
	This is justified because "no further operations" are performed by the pipeline
	after the fragment is discarded, and that includes writing to depth output attachments.

	Assuming the shader is valid, no special translation is required.

	### Handling output sample mask

	By the same reasoning as for depth-replacing, it is ok to incidentally not write
	to the sample-mask builtin variable when the fragment is discarded.

	### Handling clip distance and cull distance

	Most builtin variables are scalars or vectors.
	However, the `ClipDistance` and `CullDistance` builtin variables are arrays of 32-bit float values.
	Each entry defines a clip half-plane (respectively cull half-plane)
	A Vulkan implementation must support array sizes of up to 8 elements.

	How prevalent are shaders that use these features?
	These variables are supported when Vulkan features `shaderClipDistance` and `shaderCullDistance`
	are supported.
	According to gpuinfo.org as of this writing, those
	Vulkan features appear to be nearly universally supported on Windows devices (>99%),
	but by only 70% on Android.
	It appears that Qualcomm devices support them, but Mali devices do not (e.g. Mali-G77).

	The proposed translation scheme forces a copy of each array from private
	variables into the return value of a vertex shader, or into a private
	variable of a fragment shader.
	In addition to the register pressure, there may be a performance degradation
	due to the bulk copying of data.

	We think this is an acceptable tradeoff for the gain in usability and
	consistency with other pipeline inputs and outputs.

	## Translation scheme for WGSL AST to SPIR-V

	To translate from the WGSL AST to SPIR-V, do the following:

	- Each entry point formal parameter is mapped to a SPIR-V `Input` variable.
	- Struct and array inputs may have to be broken down into individual variables.
	- The return of the entry point is broken down into fields, with one
	`Output` variable per field.
	- In the above, builtins must be separated from user attributes.
	- Builtin attributes are moved to the corresponding variable.
	- Location and interpolation attributes are moved to the corresponding
	variables.
	- This translation relies on the fact that pipeline inputs and pipeline
	outputs are IO-shareable types. IO-shareable types are always storable,
	and can be the store type of input/output variables.
	- Input function parameters will be automatically initialized by the system
	as part of setting up the pipeline inputs to the entry point.
	- Replace each return statement in the entry point with a code sequence
	which writes the return value components to the synthesized output variables,
	and then executes an `OpReturn` (without value).

	This translation is sufficient even for fragment shaders with discard.
	In that case, outputs will be ignored because downstream pipeline
	operations will not be performed.
	This is the same rationale as for translation from SPIR-V to WGSL AST.