GPU/External Registers: Difference between revisions
PPF rewrites + add T2L stuff |
Add hotspot profiling registers |
||
| (3 intermediate revisions by one other user not shown) | |||
| Line 57: | Line 57: | ||
| ? | | ? | ||
| Writes 0xFF2 on GPU init. | | Writes 0xFF2 on GPU init. | ||
|- | |||
| 0x1EF00064 | |||
| 0x10400064 | |||
| 0xC | |||
| [[#Hotspot Profiling|Hotspot Profiling]] registers | |||
|- | |- | ||
| 0x1EF000C0 | | 0x1EF000C0 | ||
| Line 144: | Line 149: | ||
These registers are used by [[GSP Shared Memory#GX SetMemoryFill|GX SetMemoryFill]]. | These registers are used by [[GSP Shared Memory#GX SetMemoryFill|GX SetMemoryFill]]. | ||
== Hotspot Profiling == | |||
{| class="wikitable" border="1" | |||
! User VA | |||
! Bits | |||
! Description | |||
|- | |||
| 0x1EF00064 | |||
| 0x00000001 | |||
| Enable bit | |||
|- | |||
| 0x1EF00068 | |||
| 0x0000FFFF | |||
| Interval count | |||
|- | |||
| 0x1EF00068 | |||
| 0xFFFF0000 | |||
| Interval length - 1 | |||
|- | |||
| 0x1EF0006C | |||
| 0xFFFFFFFF | |||
| Result FIFO (4 * u32) | |||
|} | |||
These registers provide a way to profile what parts of the GPU hardware are busy / working / stalling the most during a certain measuring interval. | |||
What exactly the number corresponds to is unclear, but it's likely there to enable developers to identify bottlenecks in the rendering pipeline. | |||
The interval count is the amount of intervals that will be recorded in a row once measurement has started. | |||
When setting the interval count to 0, the measurement will continue to run until the Result FIFO is read at least once. | |||
When measuring for longer than 0xFFFF intervals, the counters are reset to 0 when the total amount of measurements overflows. | |||
In total there are 8 counters for different stages of the GPU pipeline. | |||
For each measurement interval, one GPU stage has its counter increased, so that after measurement the sum of all counters equals the interval count. | |||
The interval length is the amount of GPU clock cycles that each measurement interval lasts. | |||
The GPU runs at 268Mhz, see [[Hardware#Common hardware|Common hardware]] for the exact frequency. | |||
<br> Note: for Interval length < 3, the stage that has its counter increased seems to always be the first one. This may need more testing. | |||
Writing 1 to the enable bit starts the measurement. | |||
The resulting data is obtained by reading from the Result FIFO 4 times. | |||
Each u32 word contains two u16 counters. | |||
The below table contains educated guesses at what hardware these counters correspond to based on some testing. | |||
{| class="wikitable" border="1" | |||
! Word | |||
! Bits | |||
! GPU Module | |||
! Reasoning | |||
|- | |||
| 0 | |||
| 0x0000FFFF | |||
| Array reads | |||
| Depends on input primitives and the size of vertex attributes, irrespective of what ends up on screen. | |||
|- | |||
| 0 | |||
| 0xFFFF0000 | |||
| Vertex shader | |||
| Can be increased by inserting more operations into the vertex shader. | |||
|- | |||
| 1 | |||
| 0x0000FFFF | |||
| Primitive setup / Culling | |||
| Sits between Rasterizer and Vertex shader and only slightly depends on what is on screen. | |||
|- | |||
| 1 | |||
| 0xFFFF0000 | |||
| Rasterizer | |||
| Depends on the number and size of triangles on screen. | |||
|- | |||
| 2 | |||
| 0x0000FFFF | |||
| Texture reads | |||
| Depends on the density and total amount of Texels on screen. | |||
|- | |||
| 2 | |||
| 0xFFFF0000 | |||
| Lighting calculations | |||
| Depends on enabled lighting settings. | |||
|- | |||
| 3 | |||
| 0x0000FFFF | |||
| Color combiners | |||
| Depends on enabled TexEnv stages. | |||
|- | |||
| 3 | |||
| 0xFFFF0000 | |||
| Framebuffer operations | |||
| Depends on area covered. | |||
|} | |||
What happens when reading the FIFO before a fixed length measurement has completed has not been tested. | |||
== LCD Source Framebuffer Setup == | == LCD Source Framebuffer Setup == | ||
| Line 565: | Line 662: | ||
* Width dimensions are required to be aligned to 16 bytes when doing RGB8 transfers. | * Width dimensions are required to be aligned to 16 bytes when doing RGB8 transfers. | ||
** Otherwise they are required to be aligned to 8 bytes. | ** Otherwise they are required to be aligned to 8 bytes. | ||
* If downscale is used, input and output dimensions should be the same, and width/2 must also follow alignment constraints. | * If downscale is used, input and output dimensions should be the same (otherwise the output is glitched), and width/2 must also follow alignment constraints. | ||
Format conversion results: | Format conversion results: | ||
| Line 610: | Line 707: | ||
|- | |- | ||
| RGB565 -> RGB565 | | RGB565 -> RGB565 | ||
| style="background: | | style="background: lightgreen" | Has interrupt, correct output | ||
|- | |- | ||
| RGB565 -> RGB5A1 | | RGB565 -> RGB5A1 | ||
| style="background: | | style="background: lightgreen" | Has interrupt, correct output | ||
|- | |- | ||
| RGB565 -> RGBA4 | | RGB565 -> RGBA4 | ||
| style="background: | | style="background: lightgreen" | Has interrupt, correct output | ||
|- | |- | ||
| RGB5A1 -> RGBA8 | | RGB5A1 -> RGBA8 | ||
| Line 625: | Line 722: | ||
|- | |- | ||
| RGB5A1 -> RGB565 | | RGB5A1 -> RGB565 | ||
| style="background: | | style="background: lightgreen" | Has interrupt, correct output | ||
|- | |- | ||
| RGB5A1 -> RGB5A1 | | RGB5A1 -> RGB5A1 | ||
| style="background: | | style="background: lightgreen" | Has interrupt, correct output | ||
|- | |- | ||
| RGB5A1 -> RGBA4 | | RGB5A1 -> RGBA4 | ||
| style="background: | | style="background: lightgreen" | Has interrupt, correct output | ||
|- | |- | ||
| RGBA4 -> RGBA8 | | RGBA4 -> RGBA8 | ||
| Line 640: | Line 737: | ||
|- | |- | ||
| RGBA4 -> RGB565 | | RGBA4 -> RGB565 | ||
| style="background: | | style="background: lightgreen" | Has interrupt, correct output | ||
|- | |- | ||
| RGBA4 -> RGB5A1 | | RGBA4 -> RGB5A1 | ||
| style="background: | | style="background: lightgreen" | Has interrupt, correct output | ||
|- | |- | ||
| RGBA4 -> RGBA4 | | RGBA4 -> RGBA4 | ||
| style="background: | | style="background: lightgreen" | Has interrupt, correct output | ||
|} | |} | ||
=== Tiled to tiled === | |||
Officially this is always used with 2x2 downscale, other configurations give glitched output. Hence, this is used for antialiasing and mipmap generation. | |||
The following constraints apply: | |||
* Output dimensions should not be bigger than input ones, otherwise the output is glitched. | |||
* Width dimensions must be >= 64. | |||
* Height dimensions must be >= 32. | |||
* Width dimensions are required to be aligned to 64 bytes when doing RGB8/RGBA8 transfers. | |||
** Otherwise they are required to be aligned to 128 bytes. | |||
Format conversion results: same as tiled->linear. | |||
=== TextureCopy === | === TextureCopy === | ||