GPU/Shader Instruction Set: Difference between revisions

Neobrain (talk | contribs)
Fix LOOP instruction: Number of iterations is actually given by INT.x directly
Neobrain (talk | contribs)
Clean up register info
Line 507: Line 507:
|  3
|  3
|  LOOP
|  LOOP
|  Loops over the code between itself and DST (inclusive), performing INT.x+1 iterations in total. First, aL is initialized to INT.y. After each iteration, aL is incremented by INT.z. (INT is i0-i3, an integer vector uniform.)
|  Loops over the code between itself and DST (inclusive), performing INT.x+1 iterations in total. First, aL is initialized to INT.y. After each iteration, aL is incremented by INT.z.
|-
|-
|  0x2A
|  0x2A
Line 708: Line 708:


== Registers ==
== Registers ==
Most registers (all the ones within the 0x00-0x7F range) are float[4] vectors. There are also boolean registers (b0-b7) and integer registers (i0-i7). How the latter ones are set is as of yet unknown.
Input attribute registers (v0-v7?) store the per-vertex data given by the CPU and hence are read-only.


Attribute (input, RO) registers are located within the 0x0-0xF range. What data they are fed is specified by the CPU.
Output attribute registers (o0-o6) hold the data to be passed to the later GPU stages and are write-only. Each of the output attribute register components is assigned a semantic by setting the corresponding [[GPU_Internal_Registers]].


Output (WO) registers are also located within the 0x0-0xF range. What type of data they are contain is specified by the CPU.
Uniform registers hold user-specified data which is constant throughout all processed vertices. There are 96 float[4] uniform registers (c0-c95), eight boolean registers (b0-b7), and four int[4] registers (i0-i3).


Temporary (RW) register are located within the 0x10-0x1F range. They can contain any type of data.
Temporary registers (r0-r15) can be used for intermediate calculations and can both be read and written.


Uniform (RO) registers are located within the 0x20-0x7F range. Their content is set by the CPU.
Many shader instructions which take float arguments have only 5 bits available for the second argument. They may hence only refer to input attributes or temporary registers. In particular, it's not possible to pass two float[4] uniforms to these instructions.


SRC2 being only 5 bits long rather than 7 bits like its friend SRC1, it can only access v (input attribute) and r (temporary) registers.
It appears that writing twice to the same output register can cause problems (e.g. GPU hangs).
 
Registers in the 0x88-0x97 range are uniform booleans.
 
It appears that writing twice to the same output register can cause problems, such as the GPU hanging.


DST mapping :
DST mapping :
Line 761: Line 757:
|  Vector uniform registers.
|  Vector uniform registers.
|}
|}
Note that 5bit SRC registers (SRC2 in format 1 for example) can't access c0-c95 because they don't have enough bits.