GPU/External Registers: Difference between revisions
m Update T2T constraints |
Add hotspot profiling registers |
||
| Line 57: | Line 57: | ||
| ? | | ? | ||
| Writes 0xFF2 on GPU init. | | Writes 0xFF2 on GPU init. | ||
|- | |||
| 0x1EF00064 | |||
| 0x10400064 | |||
| 0xC | |||
| [[#Hotspot Profiling|Hotspot Profiling]] registers | |||
|- | |- | ||
| 0x1EF000C0 | | 0x1EF000C0 | ||
| Line 144: | Line 149: | ||
These registers are used by [[GSP Shared Memory#GX SetMemoryFill|GX SetMemoryFill]]. | These registers are used by [[GSP Shared Memory#GX SetMemoryFill|GX SetMemoryFill]]. | ||
== Hotspot Profiling == | |||
{| class="wikitable" border="1" | |||
! User VA | |||
! Bits | |||
! Description | |||
|- | |||
| 0x1EF00064 | |||
| 0x00000001 | |||
| Enable bit | |||
|- | |||
| 0x1EF00068 | |||
| 0x0000FFFF | |||
| Interval count | |||
|- | |||
| 0x1EF00068 | |||
| 0xFFFF0000 | |||
| Interval length - 1 | |||
|- | |||
| 0x1EF0006C | |||
| 0xFFFFFFFF | |||
| Result FIFO (4 * u32) | |||
|} | |||
These registers provide a way to profile what parts of the GPU hardware are busy / working / stalling the most during a certain measuring interval. | |||
What exactly the number corresponds to is unclear, but it's likely there to enable developers to identify bottlenecks in the rendering pipeline. | |||
The interval count is the amount of intervals that will be recorded in a row once measurement has started. | |||
When setting the interval count to 0, the measurement will continue to run until the Result FIFO is read at least once. | |||
When measuring for longer than 0xFFFF intervals, the counters are reset to 0 when the total amount of measurements overflows. | |||
In total there are 8 counters for different stages of the GPU pipeline. | |||
For each measurement interval, one GPU stage has its counter increased, so that after measurement the sum of all counters equals the interval count. | |||
The interval length is the amount of GPU clock cycles that each measurement interval lasts. | |||
The GPU runs at 268Mhz, see [[Hardware#Common hardware|Common hardware]] for the exact frequency. | |||
<br> Note: for Interval length < 3, the stage that has its counter increased seems to always be the first one. This may need more testing. | |||
Writing 1 to the enable bit starts the measurement. | |||
The resulting data is obtained by reading from the Result FIFO 4 times. | |||
Each u32 word contains two u16 counters. | |||
The below table contains educated guesses at what hardware these counters correspond to based on some testing. | |||
{| class="wikitable" border="1" | |||
! Word | |||
! Bits | |||
! GPU Module | |||
! Reasoning | |||
|- | |||
| 0 | |||
| 0x0000FFFF | |||
| Array reads | |||
| Depends on input primitives and the size of vertex attributes, irrespective of what ends up on screen. | |||
|- | |||
| 0 | |||
| 0xFFFF0000 | |||
| Vertex shader | |||
| Can be increased by inserting more operations into the vertex shader. | |||
|- | |||
| 1 | |||
| 0x0000FFFF | |||
| Primitive setup / Culling | |||
| Sits between Rasterizer and Vertex shader and only slightly depends on what is on screen. | |||
|- | |||
| 1 | |||
| 0xFFFF0000 | |||
| Rasterizer | |||
| Depends on the number and size of triangles on screen. | |||
|- | |||
| 2 | |||
| 0x0000FFFF | |||
| Texture reads | |||
| Depends on the density and total amount of Texels on screen. | |||
|- | |||
| 2 | |||
| 0xFFFF0000 | |||
| Lighting calculations | |||
| Depends on enabled lighting settings. | |||
|- | |||
| 3 | |||
| 0x0000FFFF | |||
| Color combiners | |||
| Depends on enabled TexEnv stages. | |||
|- | |||
| 3 | |||
| 0xFFFF0000 | |||
| Framebuffer operations | |||
| Depends on area covered. | |||
|} | |||
What happens when reading the FIFO before a fixed length measurement has completed has not been tested. | |||
== LCD Source Framebuffer Setup == | == LCD Source Framebuffer Setup == | ||