GPU/External Registers: Difference between revisions
m progressive scan interlacing == interleaving |
m Update T2T constraints |
||
(13 intermediate revisions by 3 users not shown) | |||
Line 9: | Line 9: | ||
! Name | ! Name | ||
! Comments | ! Comments | ||
|- | |||
| 0x1EF00000 | |||
| 0x10400000 | |||
| 4 | |||
| Hardware ID | |||
| Bit2: new model | |||
|- | |- | ||
| 0x1EF00004 | | 0x1EF00004 | ||
Line 134: | Line 140: | ||
Memory fills are used to initialize buffers in memory with a given value, similar to memset. A memory fill is triggered by setting bit0 in the control register. Doing so aborts any running memory fills on that filling unit. Upon completion, the hardware unsets bit0 and sets bit1 and fires interrupt PSC0. | Memory fills are used to initialize buffers in memory with a given value, similar to memset. A memory fill is triggered by setting bit0 in the control register. Doing so aborts any running memory fills on that filling unit. Upon completion, the hardware unsets bit0 and sets bit1 and fires interrupt PSC0. | ||
The addresses must be part of VRAM. | |||
These registers are used by [[GSP Shared Memory#GX SetMemoryFill|GX SetMemoryFill]]. | These registers are used by [[GSP Shared Memory#GX SetMemoryFill|GX SetMemoryFill]]. | ||
Line 235: | Line 243: | ||
VClock = PClock / (HTotal + 1) / (VTotal + 1) | VClock = PClock / (HTotal + 1) / (VTotal + 1) | ||
Setting this to 494 lowers framerate to about 50.040660858 Hz ((268111856 / 24) / ( | Setting this to 494 lowers framerate to about 50.040660858 Hz ((268111856 / 24) / (450 + 1) / (494 + 1)). | ||
|- | |- | ||
| 0x28 | | 0x28 | ||
Line 372: | Line 380: | ||
|- | |- | ||
| 5-4 | | 5-4 | ||
| Framebuffer | | Framebuffer interlacing mode | ||
0 - A ( | 0 - A (no interlacing) | ||
1 - AA ( | 1 - AA (scanline doubling) | ||
2 - AB ( | 2 - AB (interlace enable) | ||
3 - BA (same as above, but the | 3 - BA (same as above, but the fields are inverted) | ||
In AB and BA interlace modes, a scanline from each framebuffer is output in an alternating manner. In AB mode, Framebuffer A is output on the frist display scanline. Similarly, in BA mode, Framebuffer B gets output to the first display scanline. | |||
The way AB and BA modes work, is that a scanline is output, the framebuffer stride value is added to the internal scanline pointer value, and the other framebuffer is selected. And this alternates until the end of the draw region. | |||
AA interlacing works like AB interlacing, except both internal framebuffer pointers are set to the Framebuffer A pointer value. | |||
In A mode (no interlacing), it doesn't switch to the other framebuffer at the end of outpuitting a scanline to the display. | |||
Bottom screen has this set to 0 (A mode, no interlacing) at all times. | |||
Top screen uses AB interlacing in 3D mode (with 3D slider enabled), and A mode (no interlacing) in 2D mode. | |||
|- | |- | ||
| 6 | | 6 | ||
| | | Alternative pixel output mode* | ||
|- | |- | ||
| 7 | | 7 | ||
Line 393: | Line 408: | ||
| DMA size | | DMA size | ||
0 - 4 words (32 bytes) | 0 - 4 FCRAM words (32 bytes) | ||
1 - 8 words (64 bytes) | 1 - 8 FCRAM words (64 bytes) | ||
2 - 16 words (128 bytes) | 2 - 16 FCRAM words (128 bytes) | ||
3 - ??? | 3 - ??? | ||
Line 404: | Line 419: | ||
|} | |} | ||
* The weird thing about | |||
<nowiki>*</nowiki> The weird thing about bit6, is that it works different between the bottom and top LCD. On the bottom LCD, it doubles the number of outputted pixels (so the same pixel is outputted twice, effectively doing pixel/column doubling). However on the top screen, it does scanline doubling instead. | |||
Most likely the top screen receives two pixels at once per clock unit, outputting two scanlines simultaneously. | |||
On a 2DS, it seems to have no effect on the top part of the display, and on the bottom screen it just shifts the framebuffer to the right two pixels. | On a 2DS, it seems to have no effect on the top part of the display, and on the bottom screen it just shifts the framebuffer to the right two pixels. | ||
GSP module only allows the LCD stereoscopy (3D) to be enabled when bit5=1 and bit6=0 here. When GSP module updates this register, GSP module will automatically disable the stereoscopy if those bits are not set for enabling stereoscopy. | |||
When both interlacing and alternative mode is disabled (bit6=0), the full resolution of the top screen (240x800) can be utilized if the PDC registers are updated to accomodate this higher resolution. GSP contains tables for this mode (gsp mode == 1). GSP automatically applies this mode if both bit5 and bit6 are cleared. This is also the default, and the only valid mode for the bottom screen in userland. | |||
If only AB interlacing is enabled (bit5=1, bit6=0), gsp detects this as a request to switch to 3D mode (gsp mode == 2), and enables the parallax barrier. | |||
It's unknown how to control this, but some other PDC registers control if interlacing should be done by true interleaving (both framebuffers are treated as 240x400), or by skipping lines (both framebuffers are treated as 240x800). | |||
If only | If only alternative mode is enabled (bit5=0, bit6=1), gsp detects it as a request to switch back to 2D mode for the top screen (gsp mode == 0). This is also the default mode for the top screen. | ||
Both interlacing and scan doubling can't be enabled in usermode, but it works as expected in baremetal. | Both interlacing and scan doubling can't be enabled in usermode, but it works as expected in baremetal. | ||
Line 457: | Line 475: | ||
|- | |- | ||
| 0x1EF00C08 | | 0x1EF00C08 | ||
| DisplayTransfer output width (bits 0-15) and height (bits 16-31) | | DisplayTransfer output width (bits 0-15) and height (bits 16-31) | ||
|- | |- | ||
| 0x1EF00C0C | | 0x1EF00C0C | ||
| DisplayTransfer input width and height | | DisplayTransfer input width and height | ||
|- | |- | ||
| 0x1EF00C10 | | 0x1EF00C10 | ||
| Transfer flags | | Transfer flags | ||
|- | |- | ||
| 0x1EF00C14 | | 0x1EF00C14 | ||
| GSP | | ?, GSP writes value 0 here prior to writing to 0x1EF00C18 for DisplayTransfer | ||
|- | |- | ||
| 0x1EF00C18 | | 0x1EF00C18 | ||
| Setting bit0 starts the transfer | | Setting bit0 starts the transfer; upon completion, bit0 is unset and bit8 is set | ||
|- | |- | ||
| 0x1EF00C1C | | 0x1EF00C1C | ||
Line 475: | Line 493: | ||
|- | |- | ||
| 0x1EF00C20 | | 0x1EF00C20 | ||
| TextureCopy total amount of data to copy, in bytes | | TextureCopy total amount of data to copy, in bytes | ||
|- | |- | ||
| 0x1EF00C24 | | 0x1EF00C24 | ||
| TextureCopy input line width (bits 0-15) and gap (bits 16-31), in 16 byte units | | TextureCopy input line width (bits 0-15) and gap (bits 16-31), in 16 byte units | ||
|- | |- | ||
| 0x1EF00C28 | | 0x1EF00C28 | ||
| TextureCopy output line width and gap | | TextureCopy output line width and gap | ||
|} | |} | ||
Transfer flags: | |||
{| class="wikitable" border="1" | {| class="wikitable" border="1" | ||
! Bit | ! Bit | ||
Line 492: | Line 509: | ||
|- | |- | ||
| 0 | | 0 | ||
| When set, the framebuffer data is flipped vertically | | When set, the framebuffer data is flipped vertically | ||
|- | |- | ||
| 1 | | 1 | ||
| | | Linear->tiled mode (overrides tiled->linear mode) | ||
|- | |- | ||
| 2 | | 2 | ||
| This bit is required when the output width is less than the input width for the hardware to properly crop the lines, otherwise the output will be mis-aligned | | This bit is required when the output width is less than the input width for the hardware to properly crop the lines, otherwise the output will be mis-aligned | ||
|- | |- | ||
| 3 | | 3 | ||
| | | TextureCopy mode (overrides all other modes) | ||
|- | |- | ||
| 4 | | 4 | ||
Line 507: | Line 524: | ||
|- | |- | ||
| 5 | | 5 | ||
| | | Tiled->tiled mode (overrides tiled->linear, linear->tiled modes) | ||
|- | |- | ||
| 7-6 | | 7-6 | ||
Line 513: | Line 530: | ||
|- | |- | ||
| 10-8 | | 10-8 | ||
| Input | | Input [[GPU/External_Registers#Framebuffer_color_formats|color format]] | ||
|- | |- | ||
| 11 | | 11 | ||
Line 519: | Line 536: | ||
|- | |- | ||
| 14-12 | | 14-12 | ||
| Output | | Output color format | ||
|- | |- | ||
| 15 | | 15 | ||
Line 525: | Line 542: | ||
|- | |- | ||
| 16 | | 16 | ||
| Use 32x32 block tiling mode, instead of the usual 8x8 one | | Use 32x32 block tiling mode, instead of the usual 8x8 one (output dimensions must be multiples of 32, even if cropping with bit 2 set above) | ||
|- | |- | ||
| 17-23 | | 17-23 | ||
Line 531: | Line 548: | ||
|- | |- | ||
| 24-25 | | 24-25 | ||
| Scale down the input image using a box filter | | Scale down the input image using a box filter (0 = No downscale, 1 = 2x1 downscale, 2 = 2x2 downscale, 3 = invalid) | ||
|- | |- | ||
| 31-26 | | 31-26 | ||
| Not writable | | Not writable | ||
|} | |} | ||
These registers are used by [[GSP_Shared_Memory#Commands|GSP]] for DisplayTransfer and TextureCopy. TextureCopy registers are only used in TextureCopy mode; likewise, DisplayTransfer registers are only used when TextureCopy mode is not set. By default, DisplayTransfer will work in tiled->linear mode. | |||
=== Tiled to linear === | |||
Unswizzles the input buffer, this is usually used for transferring GPU framebuffer data onto LCD framebuffers. The following constraints apply: | |||
* Output dimensions must not be bigger than input ones. | |||
* Width dimensions must be >= 64. | |||
* Height dimensions must be >= 16. | |||
* Width dimensions are required to be aligned to 16 bytes when doing RGB8 transfers. | |||
** Otherwise they are required to be aligned to 8 bytes. | |||
* If downscale is used, input and output dimensions should be the same (otherwise the output is glitched), and width/2 must also follow alignment constraints. | |||
Format conversion results: | |||
{| class="wikitable" border="1" | |||
! Conversion | |||
! Result | |||
|- | |||
| RGBA8 -> RGBA8 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGBA8 -> RGB8 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGBA8 -> RGB565 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGBA8 -> RGB5A1 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGBA8 -> RGBA4 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGB8 -> RGBA8 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGB8 -> RGB8 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGB8 -> RGB565 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGB8 -> RGB5A1 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGB8 -> RGBA4 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGB565 -> RGBA8 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGB565 -> RGB8 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGB565 -> RGB565 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGB565 -> RGB5A1 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGB565 -> RGBA4 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGB5A1 -> RGBA8 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGB5A1 -> RGB8 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGB5A1 -> RGB565 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGB5A1 -> RGB5A1 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGB5A1 -> RGBA4 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGBA4 -> RGBA8 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGBA4 -> RGB8 | |||
| style="background: salmon" | No interrupt | |||
|- | |||
| RGBA4 -> RGB565 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGBA4 -> RGB5A1 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|- | |||
| RGBA4 -> RGBA4 | |||
| style="background: lightgreen" | Has interrupt, correct output | |||
|} | |||
=== Tiled to tiled === | |||
Officially this is always used with 2x2 downscale, other configurations give glitched output. Hence, this is used for antialiasing and mipmap generation. | |||
The following constraints apply: | |||
* Output dimensions should not be bigger than input ones, otherwise the output is glitched. | |||
* Width dimensions must be >= 64. | |||
* Height dimensions must be >= 32. | |||
* Width dimensions are required to be aligned to 64 bytes when doing RGB8/RGBA8 transfers. | |||
** Otherwise they are required to be aligned to 128 bytes. | |||
Format conversion results: same as tiled->linear. | |||
=== TextureCopy === | === TextureCopy === | ||
When bit 3 of the control register is set, the hardware performs a TextureCopy-mode transfer | When bit 3 of the control register is set, the hardware performs a TextureCopy-mode transfer: no format conversions are done, instead a raw data copy is performed from the source to the destination, with a configurable gap between lines. All bits of the control register are ignored, except for input/output dimensions, which are used for line width and gap, and bit 2, which must be set when gaps are used. | ||
The total amount of bytes to copy is specified in the size register, the hardware loops reading lines from the input and writing them to the output until this amount is copied. The gap specifies the number of bytes to skip after each line read (a gap of 0 results in a contiguous read). Gaps do not count towards the total size of the transfer. | |||
When setting line width and gap they must be divided by 2 (it can be thought as the calculation being done in bits, and the values being stripped of their lower 4 bits for the alignment). For example, if the left half of a 32x32 RGB8 texture is to be copied, the parameters will be: | |||
line width = (16 * 24) >> 4 = 24 | |||
gap = line width | |||
size = 16 * 32 * 3 = 1536 | |||
By correctly calculating the input and output gap sizes it is possible to use this functionality to copy arbitrary sub-rectangles between differently-sized framebuffers or textures, which is one of its main uses over a regular no-conversion DisplayTransfer. When copying tiled textures/framebuffers it's important to remember that the contents of a tile are laid out sequentially in memory, and so this should be taken into account when calculating the transfer parameters. | By correctly calculating the input and output gap sizes it is possible to use this functionality to copy arbitrary sub-rectangles between differently-sized framebuffers or textures, which is one of its main uses over a regular no-conversion DisplayTransfer. When copying tiled textures/framebuffers it's important to remember that the contents of a tile are laid out sequentially in memory, and so this should be taken into account when calculating the transfer parameters. | ||
Specifying invalid/junk values for the TextureCopy dimensions can result in the GPU hanging while attempting to process this TextureCopy. | Specifying invalid/junk values for the TextureCopy dimensions can result in the GPU hanging while attempting to process this TextureCopy. For instance, when in contiguous mode the size must be at least 16; when in gap mode, the size must be at least 192, and the line width must not be 0. | ||
== Command List == | == Command List == |