GPU/External Registers: Difference between revisions

Kynex7510 (talk | contribs)
PPF rewrites + add T2L stuff
Made (talk | contribs)
Add hotspot profiling registers
 
(3 intermediate revisions by one other user not shown)
Line 57: Line 57:
| ?
| ?
| Writes 0xFF2 on GPU init.
| Writes 0xFF2 on GPU init.
|-
| 0x1EF00064
| 0x10400064
| 0xC
| [[#Hotspot Profiling|Hotspot Profiling]] registers
|-
|-
| 0x1EF000C0
| 0x1EF000C0
Line 144: Line 149:


These registers are used by [[GSP Shared Memory#GX SetMemoryFill|GX SetMemoryFill]].
These registers are used by [[GSP Shared Memory#GX SetMemoryFill|GX SetMemoryFill]].
== Hotspot Profiling ==
{| class="wikitable" border="1"
! User VA
! Bits
! Description
|-
| 0x1EF00064
| 0x00000001
| Enable bit
|-
| 0x1EF00068
| 0x0000FFFF
| Interval count
|-
| 0x1EF00068
| 0xFFFF0000
| Interval length - 1
|-
| 0x1EF0006C
| 0xFFFFFFFF
| Result FIFO (4 * u32)
|}
These registers provide a way to profile what parts of the GPU hardware are busy / working / stalling the most during a certain measuring interval.
What exactly the number corresponds to is unclear, but it's likely there to enable developers to identify bottlenecks in the rendering pipeline.
The interval count is the amount of intervals that will be recorded in a row once measurement has started.
When setting the interval count to 0, the measurement will continue to run until the Result FIFO is read at least once.
When measuring for longer than 0xFFFF intervals, the counters are reset to 0 when the total amount of measurements overflows.
In total there are 8 counters for different stages of the GPU pipeline.
For each measurement interval, one GPU stage has its counter increased, so that after measurement the sum of all counters equals the interval count.
The interval length is the amount of GPU clock cycles that each measurement interval lasts.
The GPU runs at 268Mhz, see [[Hardware#Common hardware|Common hardware]] for the exact frequency.
<br> Note: for Interval length < 3, the stage that has its counter increased seems to always be the first one. This may need more testing.
Writing 1 to the enable bit starts the measurement.
The resulting data is obtained by reading from the Result FIFO 4 times.
Each u32 word contains two u16 counters.
The below table contains educated guesses at what hardware these counters correspond to based on some testing.
{| class="wikitable" border="1"
! Word
! Bits
! GPU Module
! Reasoning
|-
| 0
| 0x0000FFFF
| Array reads
| Depends on input primitives and the size of vertex attributes, irrespective of what ends up on screen.
|-
| 0
| 0xFFFF0000
| Vertex shader
| Can be increased by inserting more operations into the vertex shader.
|-
| 1
| 0x0000FFFF
| Primitive setup / Culling
| Sits between Rasterizer and Vertex shader and only slightly depends on what is on screen.
|-
| 1
| 0xFFFF0000
| Rasterizer
| Depends on the number and size of triangles on screen.
|-
| 2
| 0x0000FFFF
| Texture reads
| Depends on the density and total amount of Texels on screen.
|-
| 2
| 0xFFFF0000
| Lighting calculations
| Depends on enabled lighting settings.
|-
| 3
| 0x0000FFFF
| Color combiners
| Depends on enabled TexEnv stages.
|-
| 3
| 0xFFFF0000
| Framebuffer operations
| Depends on area covered.
|}
What happens when reading the FIFO before a fixed length measurement has completed has not been tested.


== LCD Source Framebuffer Setup ==
== LCD Source Framebuffer Setup ==
Line 565: Line 662:
* Width dimensions are required to be aligned to 16 bytes when doing RGB8 transfers.
* Width dimensions are required to be aligned to 16 bytes when doing RGB8 transfers.
** Otherwise they are required to be aligned to 8 bytes.
** Otherwise they are required to be aligned to 8 bytes.
* If downscale is used, input and output dimensions should be the same, and width/2 must also follow alignment constraints.
* If downscale is used, input and output dimensions should be the same (otherwise the output is glitched), and width/2 must also follow alignment constraints.


Format conversion results:
Format conversion results:
Line 610: Line 707:
|-
|-
| RGB565 -> RGB565
| RGB565 -> RGB565
| style="background: yellow" | Has interrupt, output not tested
| style="background: lightgreen" | Has interrupt, correct output
|-
|-
| RGB565 -> RGB5A1
| RGB565 -> RGB5A1
| style="background: yellow" | Has interrupt, output not tested
| style="background: lightgreen" | Has interrupt, correct output
|-
|-
| RGB565 -> RGBA4
| RGB565 -> RGBA4
| style="background: yellow" | Has interrupt, output not tested
| style="background: lightgreen" | Has interrupt, correct output
|-
|-
| RGB5A1 -> RGBA8
| RGB5A1 -> RGBA8
Line 625: Line 722:
|-
|-
| RGB5A1 -> RGB565
| RGB5A1 -> RGB565
| style="background: yellow" | Has interrupt, output not tested
| style="background: lightgreen" | Has interrupt, correct output
|-
|-
| RGB5A1 -> RGB5A1
| RGB5A1 -> RGB5A1
| style="background: yellow" | Has interrupt, output not tested
| style="background: lightgreen" | Has interrupt, correct output
|-
|-
| RGB5A1 -> RGBA4
| RGB5A1 -> RGBA4
| style="background: yellow" | Has interrupt, output not tested
| style="background: lightgreen" | Has interrupt, correct output
|-
|-
| RGBA4 -> RGBA8
| RGBA4 -> RGBA8
Line 640: Line 737:
|-
|-
| RGBA4 -> RGB565
| RGBA4 -> RGB565
| style="background: yellow" | Has interrupt, output not tested
| style="background: lightgreen" | Has interrupt, correct output
|-
|-
| RGBA4 -> RGB5A1
| RGBA4 -> RGB5A1
| style="background: yellow" | Has interrupt, output not tested
| style="background: lightgreen" | Has interrupt, correct output
|-
|-
| RGBA4 -> RGBA4
| RGBA4 -> RGBA4
| style="background: yellow" | Has interrupt, output not tested
| style="background: lightgreen" | Has interrupt, correct output
|}
|}
=== Tiled to tiled ===
Officially this is always used with 2x2 downscale, other configurations give glitched output. Hence, this is used for antialiasing and mipmap generation.
The following constraints apply:
* Output dimensions should not be bigger than input ones, otherwise the output is glitched.
* Width dimensions must be >= 64.
* Height dimensions must be >= 32.
* Width dimensions are required to be aligned to 64 bytes when doing RGB8/RGBA8 transfers.
** Otherwise they are required to be aligned to 128 bytes.
Format conversion results: same as tiled->linear.


=== TextureCopy ===
=== TextureCopy ===