David Neto | 222fae1 | 2021-05-14 04:26:23 +0000 | [diff] [blame] | 1 | # SPIR-V translation of shader input and output variables |
| 2 | |
| 3 | WGSL [MR 1315](https://github.com/gpuweb/gpuweb/issues/1315) changed WGSL so |
| 4 | that pipeline inputs and outputs are handled similar to HLSL: |
| 5 | |
| 6 | - Shader pipeline inputs are the WGSL entry point function arguments. |
| 7 | - Shader pipeline outputs are the WGSL entry point return value. |
| 8 | |
| 9 | Note: In both cases, a struct may be used to pack multiple values together. |
| 10 | In that case, I/O specific attributes appear on struct members at the struct declaration. |
| 11 | |
| 12 | Resource variables, e.g. buffers, samplers, and textures, are still declared |
| 13 | as variables at module scope. |
| 14 | |
| 15 | ## Vulkan SPIR-V today |
| 16 | |
| 17 | SPIR-V for Vulkan models inputs and outputs as module-scope variables in |
| 18 | the Input and Output storage classes, respectively. |
| 19 | |
| 20 | The `OpEntryPoint` instruction has a list of module-scope variables that must |
| 21 | be a superset of all the input and output variables that are statically |
| 22 | accessed in the shader call tree. |
| 23 | From SPIR-V 1.4 onward, all interface variables that might be statically accessed |
| 24 | must appear on that list. |
| 25 | So that includes all resource variables that might be statically accessed |
| 26 | by the shader call tree. |
| 27 | |
| 28 | ## Translation scheme for SPIR-V to WGSL |
| 29 | |
| 30 | A translation scheme from SPIR-V to WGSL is as follows: |
| 31 | |
| 32 | Each SPIR-V entry point maps to a set of Private variables proxying the |
| 33 | inputs and outputs, and two functions: |
| 34 | |
| 35 | - An inner function with no arguments or return values, and whose body |
| 36 | is the same as the original SPIR-V entry point. |
| 37 | - Original input variables are mapped to pseudo-in Private variables |
| 38 | with the same store types, but no other attributes or properties copied. |
David Neto | ba403cd | 2021-06-11 20:45:06 +0000 | [diff] [blame] | 39 | In Vulkan, Input variables don't have initalizers. |
David Neto | 222fae1 | 2021-05-14 04:26:23 +0000 | [diff] [blame] | 40 | - Original output variables are mapped to pseudo-out Private variables |
David Neto | ba403cd | 2021-06-11 20:45:06 +0000 | [diff] [blame] | 41 | with the same store types and optional initializer, but no other attributes |
| 42 | or properties are copied. |
David Neto | 222fae1 | 2021-05-14 04:26:23 +0000 | [diff] [blame] | 43 | - A wrapper entry point function whose arguments correspond in type, location |
| 44 | and builtin attributes the original input variables, and whose return type is |
| 45 | a structure containing members correspond in type, location, and builtin |
| 46 | attributes to the original output variables. |
| 47 | The body of the wrapper function the following phases: |
| 48 | - Copy formal parameter values into pseudo-in variables. |
David Neto | ba403cd | 2021-06-11 20:45:06 +0000 | [diff] [blame] | 49 | - Insert a bitcast if the WGSL builtin variable has different signedness |
| 50 | from the SPIR-V declared type. |
David Neto | 222fae1 | 2021-05-14 04:26:23 +0000 | [diff] [blame] | 51 | - Execute the inner function. |
| 52 | - Copy pseudo-out variables into the return structure. |
David Neto | ba403cd | 2021-06-11 20:45:06 +0000 | [diff] [blame] | 53 | - Insert a bitcast if the WGSL builtin variable has different signedness |
| 54 | from the SPIR-V declared type. |
David Neto | 222fae1 | 2021-05-14 04:26:23 +0000 | [diff] [blame] | 55 | - Return the return structure. |
| 56 | |
| 57 | - Replace uses of the the original input/output variables to the pseudo-in and |
| 58 | pseudo-out variables, respectively. |
| 59 | - Remap pointer-to-Input with pointer-to-Private |
| 60 | - Remap pointer-to-Output with pointer-to-Private |
| 61 | |
| 62 | We are not concerned with the cost of extra copying input/output values. |
| 63 | First, the pipeline inputs/outputs tend to be small. |
| 64 | Second, we expect the backend compiler in the driver will be able to see |
| 65 | through the copying and optimize the result. |
| 66 | |
| 67 | ### Example |
| 68 | |
| 69 | |
| 70 | ```glsl |
| 71 | #version 450 |
| 72 | |
| 73 | layout(location = 0) out vec4 frag_colour; |
| 74 | layout(location = 0) in vec4 the_colour; |
| 75 | |
| 76 | void bar() { |
| 77 | frag_colour = the_colour; |
| 78 | } |
| 79 | |
| 80 | void main() { |
| 81 | bar(); |
| 82 | } |
| 83 | ``` |
| 84 | |
| 85 | Current translation, through SPIR-V, SPIR-V reader, WGSL writer: |
| 86 | |
| 87 | ```groovy |
Ben Clayton | 01e4b6f | 2022-01-19 22:46:57 +0000 | [diff] [blame^] | 88 | @location(0) var<out> frag_colour : vec4<f32>; |
| 89 | @location(0) var<in> the_colour : vec4<f32>; |
David Neto | 222fae1 | 2021-05-14 04:26:23 +0000 | [diff] [blame] | 90 | |
| 91 | fn bar_() -> void { |
| 92 | const x_14 : vec4<f32> = the_colour; |
| 93 | frag_colour = x_14; |
| 94 | return; |
| 95 | } |
| 96 | |
Ben Clayton | 01e4b6f | 2022-01-19 22:46:57 +0000 | [diff] [blame^] | 97 | @stage(fragment) |
David Neto | 222fae1 | 2021-05-14 04:26:23 +0000 | [diff] [blame] | 98 | fn main() -> void { |
| 99 | bar_(); |
| 100 | return; |
| 101 | } |
| 102 | ``` |
| 103 | |
| 104 | Proposed translation, through SPIR-V, SPIR-V reader, WGSL writer: |
| 105 | |
| 106 | ```groovy |
| 107 | // 'in' variables are now 'private'. |
| 108 | var<private> frag_colour : vec4<f32>; |
| 109 | var<private> the_colour : vec4<f32>; |
| 110 | |
| 111 | fn bar_() -> void { |
| 112 | // Accesses to the module-scope variables do not change. |
| 113 | // This is a big simplifying advantage. |
| 114 | const x_14 : vec4<f32> = the_colour; |
| 115 | frag_colour = x_14; |
| 116 | return; |
| 117 | } |
| 118 | |
| 119 | fn main_inner() -> void { |
| 120 | bar_(); |
| 121 | return; |
| 122 | } |
| 123 | |
| 124 | // Declare a structure type to collect the return values. |
| 125 | struct main_result_type { |
Ben Clayton | 01e4b6f | 2022-01-19 22:46:57 +0000 | [diff] [blame^] | 126 | @location(0) frag_color : vec4<f32>; |
David Neto | 222fae1 | 2021-05-14 04:26:23 +0000 | [diff] [blame] | 127 | }; |
| 128 | |
Ben Clayton | 01e4b6f | 2022-01-19 22:46:57 +0000 | [diff] [blame^] | 129 | @stage(fragment) |
David Neto | 222fae1 | 2021-05-14 04:26:23 +0000 | [diff] [blame] | 130 | fn main( |
| 131 | |
| 132 | // 'in' variables are entry point parameters |
Ben Clayton | 01e4b6f | 2022-01-19 22:46:57 +0000 | [diff] [blame^] | 133 | @location(0) the_color_arg : vec4<f32> |
David Neto | 222fae1 | 2021-05-14 04:26:23 +0000 | [diff] [blame] | 134 | |
| 135 | ) -> main_result_type { |
| 136 | |
| 137 | // Save 'in' arguments to 'private' variables. |
| 138 | the_color = the_color_arg; |
| 139 | |
| 140 | // Initialize 'out' variables. |
| 141 | // Use the zero value, since no initializer was specified. |
| 142 | frag_color = vec4<f32>(); |
| 143 | |
| 144 | // Invoke the original entry point. |
| 145 | main_inner(); |
| 146 | |
| 147 | // Collect outputs into a structure and return it. |
| 148 | var result : main_outer_result_type; |
| 149 | result.frag_color = frag_color; |
| 150 | return result; |
| 151 | } |
| 152 | ``` |
| 153 | |
| 154 | Alternately, we could emit the body of the original entry point at |
| 155 | the point of invocation. |
| 156 | However that is more complex because the original entry point function |
| 157 | may return from multiple locations, and we would like to have only |
| 158 | a single exit path to construct and return the result value. |
| 159 | |
| 160 | ### Handling fragment discard |
| 161 | |
| 162 | In SPIR-V `OpKill` causes immediate termination of the shader. |
| 163 | Is the shader obligated to write its outputs when `OpKill` is executed? |
| 164 | |
| 165 | The Vulkan fragment operations are as follows: |
| 166 | (see [6. Fragment operations](https://www.khronos.org/registry/vulkan/specs/1.2/html/vkspec.html#fragops)). |
| 167 | |
| 168 | * Scissor test |
| 169 | * Sample mask test |
| 170 | * Fragment shading |
| 171 | * Multisample coverage |
| 172 | * Depth bounds test |
| 173 | * Stencil test |
| 174 | * Depth test |
| 175 | * Sample counting |
| 176 | * Coverage reduction |
| 177 | |
| 178 | After that, the fragment results are used to update output attachments, including |
| 179 | colour, depth, and stencil attachments. |
| 180 | |
| 181 | Vulkan says: |
| 182 | |
| 183 | > If a fragment operation results in all bits of the coverage mask being 0, |
| 184 | > the fragment is discarded, and no further operations are performed. |
| 185 | > Fragments can also be programmatically discarded in a fragment shader by executing one of |
| 186 | > |
| 187 | > OpKill. |
| 188 | |
| 189 | I interpret this to mean that the outputs of a discarded fragment are ignored. |
| 190 | |
| 191 | Therefore, `OpKill` does not require us to modify the basic scheme from the previous |
| 192 | section. |
| 193 | |
| 194 | The `OpDemoteToHelperInvocationEXT` |
| 195 | instruction is an alternative way to throw away a fragment, but which |
| 196 | does not immediately terminate execution of the invocation. |
| 197 | It is introduced in the [`SPV_EXT_demote_to_helper_invocation](http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/EXT/SPV_EXT_demote_to_helper_invocation.html) |
| 198 | extension. WGSL does not have this feature, but we expect it will be introduced by a |
| 199 | future WGSL extension. The same analysis applies to demote-to-helper. When introduced, |
| 200 | it will not affect translation of pipeline outputs. |
| 201 | |
| 202 | ### Handling depth-replacing mode |
| 203 | |
| 204 | A Vulkan fragment shader must write to the fragment depth builtin if and only if it |
| 205 | has a `DepthReplacing` execution mode. Otherwise behaviour is undefined. |
| 206 | |
| 207 | We will ignore the case where the SPIR-V shader writes to the `FragDepth` builtin |
| 208 | and then discards the fragment. |
| 209 | This is justified because "no further operations" are performed by the pipeline |
| 210 | after the fragment is discarded, and that includes writing to depth output attachments. |
| 211 | |
| 212 | Assuming the shader is valid, no special translation is required. |
| 213 | |
| 214 | ### Handling output sample mask |
| 215 | |
| 216 | By the same reasoning as for depth-replacing, it is ok to incidentally not write |
| 217 | to the sample-mask builtin variable when the fragment is discarded. |
| 218 | |
| 219 | ### Handling clip distance and cull distance |
| 220 | |
| 221 | Most builtin variables are scalars or vectors. |
| 222 | However, the `ClipDistance` and `CullDistance` builtin variables are arrays of 32-bit float values. |
| 223 | Each entry defines a clip half-plane (respectively cull half-plane) |
| 224 | A Vulkan implementation must support array sizes of up to 8 elements. |
| 225 | |
| 226 | How prevalent are shaders that use these features? |
| 227 | These variables are supported when Vulkan features `shaderClipDistance` and `shaderCullDistance` |
| 228 | are supported. |
| 229 | According to gpuinfo.org as of this writing, those |
| 230 | Vulkan features appear to be nearly universally supported on Windows devices (>99%), |
| 231 | but by only 70% on Android. |
| 232 | It appears that Qualcomm devices support them, but Mali devices do not (e.g. Mali-G77). |
| 233 | |
| 234 | The proposed translation scheme forces a copy of each array from private |
| 235 | variables into the return value of a vertex shader, or into a private |
| 236 | variable of a fragment shader. |
| 237 | In addition to the register pressure, there may be a performance degradation |
| 238 | due to the bulk copying of data. |
| 239 | |
| 240 | We think this is an acceptable tradeoff for the gain in usability and |
| 241 | consistency with other pipeline inputs and outputs. |
| 242 | |
| 243 | ## Translation scheme for WGSL AST to SPIR-V |
| 244 | |
| 245 | To translate from the WGSL AST to SPIR-V, do the following: |
| 246 | |
| 247 | - Each entry point formal parameter is mapped to a SPIR-V `Input` variable. |
| 248 | - Struct and array inputs may have to be broken down into individual variables. |
| 249 | - The return of the entry point is broken down into fields, with one |
| 250 | `Output` variable per field. |
| 251 | - In the above, builtins must be separated from user attributes. |
| 252 | - Builtin attributes are moved to the corresponding variable. |
| 253 | - Location and interpolation attributes are moved to the corresponding |
| 254 | variables. |
| 255 | - This translation relies on the fact that pipeline inputs and pipeline |
| 256 | outputs are IO-shareable types. IO-shareable types are always storable, |
| 257 | and can be the store type of input/output variables. |
| 258 | - Input function parameters will be automatically initialized by the system |
| 259 | as part of setting up the pipeline inputs to the entry point. |
| 260 | - Replace each return statement in the entry point with a code sequence |
| 261 | which writes the return value components to the synthesized output variables, |
| 262 | and then executes an `OpReturn` (without value). |
| 263 | |
| 264 | This translation is sufficient even for fragment shaders with discard. |
| 265 | In that case, outputs will be ignored because downstream pipeline |
| 266 | operations will not be performed. |
| 267 | This is the same rationale as for translation from SPIR-V to WGSL AST. |