GPU/External Registers: Difference between revisions
m Miss Information |
|||
| (8 intermediate revisions by 2 users not shown) | |||
| Line 57: | Line 57: | ||
| ? | | ? | ||
| Writes 0xFF2 on GPU init. | | Writes 0xFF2 on GPU init. | ||
|- | |||
| 0x1EF00064 | |||
| 0x10400064 | |||
| 0xC | |||
| [[#Hotspot Profiling|Hotspot Profiling]] registers | |||
|- | |- | ||
| 0x1EF000C0 | | 0x1EF000C0 | ||
| Line 140: | Line 145: | ||
Memory fills are used to initialize buffers in memory with a given value, similar to memset. A memory fill is triggered by setting bit0 in the control register. Doing so aborts any running memory fills on that filling unit. Upon completion, the hardware unsets bit0 and sets bit1 and fires interrupt PSC0. | Memory fills are used to initialize buffers in memory with a given value, similar to memset. A memory fill is triggered by setting bit0 in the control register. Doing so aborts any running memory fills on that filling unit. Upon completion, the hardware unsets bit0 and sets bit1 and fires interrupt PSC0. | ||
The addresses must be part of VRAM. | |||
These registers are used by [[GSP Shared Memory#GX SetMemoryFill|GX SetMemoryFill]]. | These registers are used by [[GSP Shared Memory#GX SetMemoryFill|GX SetMemoryFill]]. | ||
== Hotspot Profiling == | |||
{| class="wikitable" border="1" | |||
! User VA | |||
! Bits | |||
! Description | |||
|- | |||
| 0x1EF00064 | |||
| 0x00000001 | |||
| Enable bit | |||
|- | |||
| 0x1EF00068 | |||
| 0x0000FFFF | |||
| Interval count | |||
|- | |||
| 0x1EF00068 | |||
| 0xFFFF0000 | |||
| Interval length - 1 | |||
|- | |||
| 0x1EF0006C | |||
| 0xFFFFFFFF | |||
| Result FIFO | |||
|} | |||
These registers provide a way to profile what parts of the GPU hardware are busy / working / stalling the most during a certain measuring interval. | |||
What exactly the number corresponds to is unclear, but it's likely there to enable developers to identify bottlenecks in the rendering pipeline. | |||
The interval count is the amount of intervals that will be recorded in a row once measurement has started. | |||
When setting the interval count to 0, the measurement will continue to run until the Result FIFO is read at least once. | |||
When measuring for longer than 0xFFFF intervals, the counters are reset to 0 when the total amount of measurements overflows. | |||
In total there are 8 counters for different stages of the GPU pipeline. | |||
For each measurement interval, one GPU stage has its counter increased, so that after measurement the sum of all counters equals the interval count. | |||
The interval length is the amount of GPU clock cycles that each measurement interval lasts. | |||
The GPU runs at 268Mhz, see [[Hardware#Common hardware|Common hardware]] for the exact frequency. | |||
<br> Note: for Interval length < 3, the stage that has its counter increased seems to always be the first one. This may need more testing. | |||
Writing 1 to the enable bit starts the measurement. | |||
The resulting data is obtained by reading from the Result FIFO 4 times. | |||
Each u32 word contains two u16 counters. | |||
The below table contains educated guesses at what hardware these counters correspond to based on some testing. | |||
{| class="wikitable" border="1" | |||
! Word | |||
! Bits | |||
! GPU Module | |||
! Reasoning | |||
|- | |||
| 0 | |||
| 0x0000FFFF | |||
| Array reads | |||
| Depends on input primitives and the size of vertex attributes, irrespective of what ends up on screen. | |||
|- | |||
| 0 | |||
| 0xFFFF0000 | |||
| Vertex shader | |||
| Can be increased by inserting more operations into the vertex shader. | |||
|- | |||
| 1 | |||
| 0x0000FFFF | |||
| Primitive setup / Culling | |||
| Sits between Rasterizer and Vertex shader and only slightly depends on what is on screen. | |||
|- | |||
| 1 | |||
| 0xFFFF0000 | |||
| Rasterizer | |||
| Depends on the number and size of triangles on screen. | |||
|- | |||
| 2 | |||
| 0x0000FFFF | |||
| Texture reads | |||
| Depends on the density and total amount of Texels on screen. | |||
|- | |||
| 2 | |||
| 0xFFFF0000 | |||
| Lighting calculations | |||
| Depends on enabled lighting settings. | |||
|- | |||
| 3 | |||
| 0x0000FFFF | |||
| Color combiners | |||
| Depends on enabled TexEnv stages. | |||
|- | |||
| 3 | |||
| 0xFFFF0000 | |||
| Framebuffer operations | |||
| Depends on area covered. | |||
|} | |||
What happens when reading the FIFO before a fixed length measurement has completed has not been tested. | |||
== LCD Source Framebuffer Setup == | == LCD Source Framebuffer Setup == | ||
| Line 473: | Line 572: | ||
|- | |- | ||
| 0x1EF00C08 | | 0x1EF00C08 | ||
| DisplayTransfer output width (bits 0-15) and height (bits 16-31) | | DisplayTransfer output width (bits 0-15) and height (bits 16-31) | ||
|- | |- | ||
| 0x1EF00C0C | | 0x1EF00C0C | ||
| DisplayTransfer input width and height | | DisplayTransfer input width and height | ||
|- | |- | ||
| 0x1EF00C10 | | 0x1EF00C10 | ||
| Transfer flags | | Transfer flags | ||
|- | |- | ||
| 0x1EF00C14 | | 0x1EF00C14 | ||
| GSP | | ?, GSP writes value 0 here prior to writing to 0x1EF00C18 for DisplayTransfer | ||
|- | |- | ||
| 0x1EF00C18 | | 0x1EF00C18 | ||
| Setting bit0 starts the transfer | | Setting bit0 starts the transfer; upon completion, bit0 is unset and bit8 is set | ||
|- | |- | ||
| 0x1EF00C1C | | 0x1EF00C1C | ||
| Line 491: | Line 590: | ||
|- | |- | ||
| 0x1EF00C20 | | 0x1EF00C20 | ||
| TextureCopy total amount of data to copy, in bytes | | TextureCopy total amount of data to copy, in bytes | ||
|- | |- | ||
| 0x1EF00C24 | | 0x1EF00C24 | ||
| TextureCopy input line width (bits 0-15) and gap (bits 16-31), in 16 byte units | | TextureCopy input line width (bits 0-15) and gap (bits 16-31), in 16 byte units | ||
|- | |- | ||
| 0x1EF00C28 | | 0x1EF00C28 | ||
| TextureCopy output line width and gap | | TextureCopy output line width and gap | ||
|} | |} | ||
Transfer flags: | |||
{| class="wikitable" border="1" | {| class="wikitable" border="1" | ||
! Bit | ! Bit | ||
| Line 508: | Line 606: | ||
|- | |- | ||
| 0 | | 0 | ||
| When set, the framebuffer data is flipped vertically | | When set, the framebuffer data is flipped vertically | ||
|- | |- | ||
| 1 | | 1 | ||
| | | Linear->tiled mode (overrides tiled->linear mode) | ||
|- | |- | ||
| 2 | | 2 | ||
| This bit is required when the output width is less than the input width for the hardware to properly crop the lines, otherwise the output will be mis-aligned | | This bit is required when the output width is less than the input width for the hardware to properly crop the lines, otherwise the output will be mis-aligned | ||
|- | |- | ||
| 3 | | 3 | ||
| | | TextureCopy mode (overrides all other modes) | ||
|- | |- | ||
| 4 | | 4 | ||
| Line 523: | Line 621: | ||
|- | |- | ||
| 5 | | 5 | ||
| | | Tiled->tiled mode (overrides tiled->linear, linear->tiled modes) | ||
|- | |- | ||
| 7-6 | | 7-6 | ||
| Line 529: | Line 627: | ||
|- | |- | ||
| 10-8 | | 10-8 | ||
| Input | | Input [[GPU/External_Registers#Framebuffer_color_formats|color format]] | ||
|- | |- | ||
| 11 | | 11 | ||
| Line 535: | Line 633: | ||
|- | |- | ||
| 14-12 | | 14-12 | ||
| Output | | Output color format | ||
|- | |- | ||
| 15 | | 15 | ||
| Line 541: | Line 639: | ||
|- | |- | ||
| 16 | | 16 | ||
| Use 32x32 block tiling mode, instead of the usual 8x8 one | | Use 32x32 block tiling mode, instead of the usual 8x8 one (output dimensions must be multiples of 32, even if cropping with bit 2 set above) | ||
|- | |- | ||
| 17-23 | | 17-23 | ||
| Line 547: | Line 645: | ||
|- | |- | ||
| 24-25 | | 24-25 | ||
| Scale down the input image using a box filter | | Scale down the input image using a box filter (0 = No downscale, 1 = 2x1 downscale, 2 = 2x2 downscale, 3 = invalid) | ||
|- | |- | ||
| 31-26 | | 31-26 | ||
| Not writable | | Not writable | ||
|} | |} | ||
These registers are used by [[GSP_Shared_Memory#Commands|GSP]] for DisplayTransfer and TextureCopy. TextureCopy registers are only used in TextureCopy mode; likewise, DisplayTransfer registers are only used when TextureCopy mode is not set. By default, DisplayTransfer will work in tiled->linear mode. | |||
=== Tiled to linear === | |||
Unswizzles the input buffer, this is usually used for transferring GPU framebuffer data onto LCD framebuffers. The following constraints apply: | |||
* Output dimensions must not be bigger than input ones. | |||
* Width dimensions must be >= 64. | |||
* Height dimensions must be >= 16. | |||
* Width dimensions are required to be aligned to 16 bytes when doing RGB8 transfers. | |||
** Otherwise they are required to be aligned to 8 bytes. | |||
* If downscale is used, input and output dimensions should be the same (otherwise the output is glitched), and width/2 must also follow alignment constraints. | |||
Format conversion results: | |||
{| class="wikitable" border="1" | |||
! Conversion | |||
! Result | |||
|- | |||
| RGBA8 -> RGBA8 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGBA8 -> RGB8 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGBA8 -> RGB565 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGBA8 -> RGB5A1 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGBA8 -> RGBA4 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGB8 -> RGBA8 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGB8 -> RGB8 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGB8 -> RGB565 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGB8 -> RGB5A1 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGB8 -> RGBA4 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGB565 -> RGBA8 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGB565 -> RGB8 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGB565 -> RGB565 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGB565 -> RGB5A1 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGB565 -> RGBA4 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGB5A1 -> RGBA8 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGB5A1 -> RGB8 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGB5A1 -> RGB565 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGB5A1 -> RGB5A1 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGB5A1 -> RGBA4 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGBA4 -> RGBA8 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGBA4 -> RGB8 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGBA4 -> RGB565 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGBA4 -> RGB5A1 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGBA4 -> RGBA4 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|} | |||
=== Tiled to tiled === | |||
Officially this is always used with 2x2 downscale, other configurations give glitched output. Hence, this is used for antialiasing and mipmap generation. | |||
The following constraints apply: | |||
* Output dimensions should not be bigger than input ones, otherwise the output is glitched. | |||
* Width dimensions must be >= 64. | |||
* Height dimensions must be >= 32. | |||
* Width dimensions are required to be aligned to 64 bytes when doing RGB8/RGBA8 transfers. | |||
** Otherwise they are required to be aligned to 128 bytes. | |||
Format conversion results: same as tiled->linear. | |||
=== TextureCopy === | === TextureCopy === | ||