GPU/External Registers: Difference between revisions

MarcusD (talk | contribs)
Add and update some newly researched PDC regs
Kynex7510 (talk | contribs)
mNo edit summary
(18 intermediate revisions by 3 users not shown)
Line 9: Line 9:
! Name
! Name
! Comments
! Comments
|-
| 0x1EF00000
| 0x10400000
| 4
| Hardware ID
| Bit2: new model
|-
|-
| 0x1EF00004
| 0x1EF00004
Line 32: Line 38:
| 4
| 4
| VRAM bank control
| VRAM bank control
| Bits 8-11 = bank[i] disabled; other bits are unused
| Bits 8-11 = bank[i] disabled; other bits are unused.
|-
|-
| 0x1EF00034
| 0x1EF00034
Line 38: Line 44:
| 4
| 4
| GPU Busy
| GPU Busy
| Bit31 = cmd-list busy, bit27 = PSC0 busy, bit26 = PSC1 busy.
| Bit26 = PSC0, bit27 = PSC1, Bit30 = PPF, Bit31 = P3D
|-
|-
| 0x1EF00050
| 0x1EF00050
Line 134: Line 140:


Memory fills are used to initialize buffers in memory with a given value, similar to memset. A memory fill is triggered by setting bit0 in the control register. Doing so aborts any running memory fills on that filling unit. Upon completion, the hardware unsets bit0 and sets bit1 and fires interrupt PSC0.
Memory fills are used to initialize buffers in memory with a given value, similar to memset. A memory fill is triggered by setting bit0 in the control register. Doing so aborts any running memory fills on that filling unit. Upon completion, the hardware unsets bit0 and sets bit1 and fires interrupt PSC0.
The addresses must not be part of FCRAM.


These registers are used by [[GSP Shared Memory#GX SetMemoryFill|GX SetMemoryFill]].
These registers are used by [[GSP Shared Memory#GX SetMemoryFill|GX SetMemoryFill]].
Line 140: Line 148:


All of these registers must be accessed with 32bit operations regardless of the registers' actual bit size.
All of these registers must be accessed with 32bit operations regardless of the registers' actual bit size.
The naming of these parameters reflects the physical characteristics of the displays, and not the way the 3DS is normally held.
To make sense of these values, the 3DS must be held in a way, so that the bottom screen is in the left hand, and the top screen is in the right hand, and that way the first pixel will be in the top-left corner, as it should be. If the 3DS is held normally, the first pixel is in the bottom-left corner.
All pixel and scanline timing values are 12bits, unless noted. This also applies to those fields where two u16 are combined into one register. Each u16 field is only 12bits in size. timin
The horizontal timing parameter order is as follows (values may overflow through HTotal register value):
0x10 < 0x14 <= 0x60.LO <= 0x04 <= 0x60.HI <= 0x08 <= 0x0C <= 0x10
0x18 <= 0x60.LO
Timing starts from HCount == 0, then each absolute value in the beforementioned register chain triggers when HCount == register, latching the primitive display controller into a new mode.
There is an inherent latch order, where if two simultenaous events occur, one event wins over another.
Known latched modes (in order):
- HSync (triggers a line to the LCD to move to the next line)
- Back porch (area between HSync and border being displayed, no pixels pushed, min 16 pixel clocks, otherwise the screen gets glitchy)
- Left border start (no image data is being displayed, just a configurable solid color)
- Image start (pixel data is being DMA'd from video memory or main RAM)
- Right border start/Image end (border color is being displayed after the main image)
- Unknown synchronization (supposed to be probably right border end, but this mode seems to be broken or not do anything)
- Front porch (no pixels pushed, 68 clock min, otherwise the screen doesn't sync properly, and really glitches out)


{| class="wikitable" border="1"
{| class="wikitable" border="1"
Line 147: Line 177:
|-
|-
| 0x00
| 0x00
| H-total (V-total on not physically rotated screens).
| HTotal
| 12bits.
| The total width of a timing scanline. In other words, this is the horizontal refresh clock divider value.


Setting this value too low will make the screen not be able to sync any pixels other than a single one from the wrong location. The lowest the screen can handle is 0x1C2, at 0x1C1 the display loses a few scanlines worth of pixel clock (though not noticable).
HClock = PClock / (HTotal + 1)
|-
|-
| 0x04
| 0x04
| HBlank timer(?)
| HStart
| Seems to determine the horizontal blanking interval.
| Determines when the image is going to be displayed in the visible region (register 0x60).
 
 
Setting this to lower than <code>HTotal - HDisp</code> will make the screen not catch up with the scanlines, some will be skipped, some will be misaligned.
 
Setting this to higher than <code>HTotal - HDisp</code> will make the displayed image misaligned to the right.
 
Setting this to higher than <code>HTotal</code> seems to make the horizontal synchronization never happen.
|-
|-
| 0x08
| 0x08
| ?
| HBR
| must be >= REG#0x00
| Right border start(?). Does nothing.
 
While this register seems to have no impact on the image whatsoever, it still has to be set to a valid value.
|  
|-
|-
| 0x0C
| 0x0C
| ?
| HPF
| must be >= REG#0x08
| Front porch. The image is blanked during this period, and no pixels are pushed to the LCD.
 
Unknown why, but a single dot of red is displayed before entering this mode.
|-
|-
| 0x10
| 0x10
| Window X start (LgyFb)
| HSync
| Offsets the viewing window on the display's physical X co-ordinate relative to its front porch end.
| Triggers a HSync pulse.


Outside of LgyFb changing this value only seems to cause weird pixel interpolation and blurryness.
Based on behavior, this needs to last at least a pixel clock for the LCD to register the sync.
|-
|-
| 0x14
| 0x14
| Window X end (LgyFb)
| HPB
| Window X start + window width - 1 is written here to set the end display scan pixel offset.
| Back porch? Has to be at least one bigger than HSync, otherwise HSync never triggers.


Outside of LgyFb it seems to offset the screen to the left if this value is high enough, but can glitch out the syncing on the bottom screen. High enough values will make the screen skip too many "pixels". If this value is higher or equal to *some value* (aka. if less than one pixel per line is displayed on the screen) then the screen will lose synchronization.
The display is blank, and the LCD displays nothing in this period (doesn't push pixels).
|-
|-
| 0x18
| 0x18
| Window Y start (LgyFb)
| HBL
| Offsets the viewing window on the display's physical Y co-ordinate relative to the first visible scanline.
| Left border trigger treshold. Enables pushing pixels to the display.
 
If this value is smaller than the back porch, then the back porch period will be zero, and the border will be immediately displayed upon entering the back porch period.
 
Can be lower than HSync, as the back porch is what takes the controller out of HSync.
 
Must be <= HDisp start (reg 0x60 low u16), otherwise no pixels will be pushed due to a glitched state.
|-
|-
| 0x1C
| 0x1C
Line 199: Line 233:
|-
|-
| 0x20
| 0x20
| Low: Window Y end (LgyFb)
| low u16: ???
High: ???
high u16: ???
| Low: Window Y start + window height - 1 is written here to set the last display scanline relative to the first visible scanline.
| ???
 
|-
High: This is cleared to zero when displaying LgyFb. Outside of LgyFb this doesn't really seem to do anything useful.
| 0x24
| VTotal
| Total height of the timing window. Can be interpreted as the vertical clock divider.


VClock = PClock / (HTotal + 1) / (VTotal + 1)


??? extra pixels get inserted in the first displayed scanline and thus the image gets shifted to the right. Seems to make horizontal syncing a bit glitchy. If a HSync occurs, the pixel data is suspended until the first pixel is supposed to be displayed, then the pixel stream will continue where it left off until a delayed HSync gets processed relative to the pixel data.
Setting this to 494 lowers framerate to about 50.040660858 Hz ((268111856 / 24) / (450 + 1) / (494 + 1)).
|-
| 0x24
| V-total (H-total on not physically rotated screens).
| Total scanlines including porches/sync timing. Setting this to 494 for the topscreen lowers framerate to about 50.040660858 Hz.
|-
|-
| 0x28
| 0x28
| VBlank timer(?)
| ?
| Seems to determine the vertical blanking interval.
| Seems to determine the vertical blanking interval.


Line 224: Line 257:
|-
|-
| 0x30
| 0x30
| VTotal
| ?
| Total amount of vertical scanlines in the pixel buffer, must be bigger than *an unknown blanking-like value*. If this value is less than VDisp then the last two scanlines will be repeated interlaced until VDisp is reached.
| Total amount of vertical scanlines in the pixel buffer, must be bigger than *an unknown blanking-like value*. If this value is less than VDisp then the last two scanlines will be repeated interlaced until VDisp is reached.
|-
|-
Line 251: Line 284:
| 0x4C
| 0x4C
| Overscan filler color
| Overscan filler color
|  
| 24bits(? top 8bits ignored)
 
When the visible region is being drawn, but the timing parameters are set up in a way that the framebuffer is smaller than the visible region, it will be filled by this color.
|-
|-
| 0x50
| 0x50
Line 263: Line 298:
| 0x5C
| 0x5C
| ???
| ???
| low u16: framebuffer width
| low u16: Image width (including some offset?)
high u16: framebuffer height??? (seems to be unused)
high u16: Image height??? (seems to be unused)
|-
|-
| 0x60
| 0x60
| ???
| HDisp
| low u16: timing data(?)
| low u16: Image start (border --> pixel data)
high u16: framebuffer total width (amount of pixels blitted regardless of framebuffer width)
high u16: Image end (pixel data --> border)
|-
|-
| 0x64
| 0x64
Line 285: Line 320:
|-
|-
| 0x70
| 0x70
| Framebuffer format
| Framebuffer format and other settings
| Bit0-15: framebuffer format, bit16-31: unknown
| See [[#Framebuffer_format|framebuffer format]]
|-
|-
| 0x74
| 0x74
Line 301: Line 336:
Bit 4: Currently displaying framebuffer?
Bit 4: Currently displaying framebuffer?
Bit 8: Reset FIFO?
Bit 8: Reset FIFO?
Bit 16: H(Blank?) IRQ status/ack. Write 1 to aknowledge.
Bit 16: HBlank IRQ status/ack. Write 1 to aknowledge.
Bit 17: VBlank IRQ status/ack.
Bit 17: VBlank IRQ status/ack.
Bit 18: Error IRQ status/ack?
Bit 18: Error IRQ status/ack?
Line 316: Line 351:
| 0x90
| 0x90
| Framebuffer stride
| Framebuffer stride
| Distance in bytes between the start of two framebuffer rows (must be a multiple of 8).
| 32bits (bottom 3bits ignored?)
 
Distance in bytes between the start of two framebuffer rows (must be a multiple of 8).


In other words, this can be interpreted as the amount to add to the framebuffer pointer after displaying a scanline.
In other words, this can be interpreted as the amount to add to the framebuffer pointer after displaying a scanline.
Line 322: Line 359:
Setting this to zero will cause only the first line of the image to be displayed repeated on the entire display. With the HSync interrupt it's possible to "race the beam" to (ab)use this feature.
Setting this to zero will cause only the first line of the image to be displayed repeated on the entire display. With the HSync interrupt it's possible to "race the beam" to (ab)use this feature.


Because of this simplicity, writing a negative value here VFlips the image (which appears as a HFlip if the 3DS is held properly)
Because of this simplicity, writing a negative value here VFlips the image, although that requires the framebuffer pointer register to be set to the start of the last scanline, instead of at the start of the framebuffer.
|-
|-
| 0x94
| 0x94
| Framebuffer B first address
| Framebuffer B first address
| For top screen, this is the right eye 3D framebuffer. Unused for bottom screen.
| For top screen, this is the right eye 3D framebuffer. Unused for bottom screen in userland.
|-
|-
| 0x98
| 0x98
| Framebuffer B second address
| Framebuffer B second address
| For top screen, this is the right eye 3D framebuffer. Unused for bottom screen.
| For top screen, this is the right eye 3D framebuffer. Unused for bottom screen in userland.
|}
|}


Line 340: Line 377:
|-
|-
| 2-0
| 2-0
| Color format
| [[#Framebuffer_color_formats|Color format]]
|-
| 3
| ?
|-
|-
| 5-4
| 5-4
| Framebuffer scanline output mode (interlace config)
| Framebuffer interlacing mode
 
0 - A  (no interlacing)
1 - AA (scanline doubling)
2 - AB (interlace enable)
3 - BA (same as above, but the fields are inverted)
 
In AB and BA interlace modes, a scanline from each framebuffer is output in an alternating manner. In AB mode, Framebuffer A is output on the frist display scanline. Similarly, in BA mode, Framebuffer B gets output to the first display scanline.
 
The way AB and BA modes work, is that a scanline is output, the framebuffer stride value is added to the internal scanline pointer value, and the other framebuffer is selected. And this alternates until the end of the draw region.
 
AA interlacing works like AB interlacing, except both internal framebuffer pointers are set to the Framebuffer A pointer value.


0 - A (output image as normal)
In A mode (no interlacing), it doesn't switch to the other framebuffer at the end of outpuitting a scanline to the display.
1 - AA (output a single line twice, aka framebuffer A is interlaced with itself)
 
2 - AB (interlace framebuffer A and framebuffer B)
Bottom screen has this set to 0 (A mode, no interlacing) at all times. 
3 - BA (same as above, but the line from framebuffer B is outputted first)
Top screen uses AB interlacing in 3D mode (with 3D slider enabled), and A mode (no interlacing) in 2D mode.


0 is used by bottom screen at all times.
1 is used by the top screen in 2D mode.
2 is used by top screen in 3D mode.
3 goes unused in userland.
|-
|-
| 6
| 6
| Scan doubling enable?* (used by top screen)
| Alternative pixel output mode*
|-
|-
| 7
| 7
Line 365: Line 406:
|-
|-
| 9-8
| 9-8
| Value 1 = unknown: get rid of rainbow strip on top of screen, 3 = unknown: black screen.
| DMA size
 
0 -  4 FCRAM words (32 bytes)
1 -  8 FCRAM words (64 bytes)
2 - 16 FCRAM words (128 bytes)
3 - ???
 
FCRAM doesn't support DMA size 3, as it can only burst up to 16 words (128 bytes), and will show a black screen instead.
|-
|-
| 15-10
| 31-16
| Unused?
| Unknown
|}
|}


* The weird thing about scan doubling, is that it works different between the bottom and top LCD. On the bottom LCD, it doubles the number of outputted pixels (so the same pixel is outputted twice, effectively doing column doubling). However on the top screen, it does scanline doubling instead. Considering that the bottom screen's table doesn't work on the top screen, this could give a hint as to how the top screen receives the pixel data from the PDC.
 
 
<nowiki>*</nowiki> The weird thing about bit6, is that it works different between the bottom and top LCD. On the bottom LCD, it doubles the number of outputted pixels (so the same pixel is outputted twice, effectively doing pixel/column doubling). However on the top screen, it does scanline doubling instead.
Most likely the top screen receives two pixels at once per clock unit, outputting two scanlines simultaneously.
 
On a 2DS, it seems to have no effect on the top part of the display, and on the bottom screen it just shifts the framebuffer to the right two pixels.
On a 2DS, it seems to have no effect on the top part of the display, and on the bottom screen it just shifts the framebuffer to the right two pixels.


 
GSP module only allows the LCD stereoscopy (3D) to be enabled when bit5=1 and bit6=0 here. When GSP module updates this register, GSP module will automatically disable the stereoscopy if those bits are not set for enabling stereoscopy.
GSP module only allows the LCD stereoscopy to be enabled when bit5=1 and bit6=0 here. When GSP module updates this register, GSP module will automatically disable the stereoscopy if those bits are not set for enabling stereoscopy.




When both interlacing and scan doubling are disabled, the full resolution of the top screen (240x800) can be utilized if the PDC registers are updated to accomodate this higher resolution. GSP contains tables for this mode (gsp mode == 1). GSP automatically applies this mode if both bit5 and bit6 are cleared. This is also the default, and the only valid mode for the bottom screen in userland.
When both interlacing and alternative mode is disabled (bit6=0), the full resolution of the top screen (240x800) can be utilized if the PDC registers are updated to accomodate this higher resolution. GSP contains tables for this mode (gsp mode == 1). GSP automatically applies this mode if both bit5 and bit6 are cleared. This is also the default, and the only valid mode for the bottom screen in userland.


If only AB interlacing is enabled, gsp detects this as a request to switch to 3D mode (gsp mode == 2), and enables the parallax barrier.
If only AB interlacing is enabled (bit5=1, bit6=0), gsp detects this as a request to switch to 3D mode (gsp mode == 2), and enables the parallax barrier.
It's unknown how to control this, but some other PDC registers control if interlacing should be done by true interleaving (both framebuffers are treated as 240x400), or skipping lines (both framebuffers are treated as 240x800)
It's unknown how to control this, but some other PDC registers control if interlacing should be done by true interleaving (both framebuffers are treated as 240x400), or by skipping lines (both framebuffers are treated as 240x800).


If only scan doubling is enabled, gsp detects it as a request to switch back to 2D mode for the top screen (gsp mode == 0). This is also the default mode for the top screen.
If only alternative mode is enabled (bit5=0, bit6=1), gsp detects it as a request to switch back to 2D mode for the top screen (gsp mode == 0). This is also the default mode for the top screen.


Both interlacing and scan doubling can't be enabled in usermode, but it works as expected in baremetal.
Both interlacing and scan doubling can't be enabled in usermode, but it works as expected in baremetal.
Line 452: Line 503:


These registers are used by [[GSP_Shared_Memory|GX command]] 3 and 4. For cmd4, *0x1EF00C18 |= 1 is used instead of just writing value 1. The DisplayTransfer registers are only used if bit 3 of the flags is unset and ignored otherwise. The TextureCopy registers are likewise only used if bit 3 is set, and ignored otherwise.
These registers are used by [[GSP_Shared_Memory|GX command]] 3 and 4. For cmd4, *0x1EF00C18 |= 1 is used instead of just writing value 1. The DisplayTransfer registers are only used if bit 3 of the flags is unset and ignored otherwise. The TextureCopy registers are likewise only used if bit 3 is set, and ignored otherwise.
The minimum supported dimension for output is 64x64, anything lower will hang the engine.


==== Flags Register - 0x1EF00C10 ====
==== Flags Register - 0x1EF00C10 ====
Line 506: Line 559:
=== TextureCopy ===
=== TextureCopy ===


When bit 3 of the control register is set, the hardware performs a TextureCopy-mode transfer. In this mode, all other bits of the control register (except for bit 2, which still needs to be set correctly) and the regular dimension registers are ignored, and no format conversions are done. Instead, it performs a raw data copy from the source to the destination, but with a configurable gap between lines. The total amount of bytes to copy is specified in the size register, and the hardware loops reading lines from the input and writing them to the output until this amount is copied. The "gap" specified in the input/output dimension register is the number of chunks to skip after each "width" chunks of the input/output, and is NOT counted towards the total size of the transfer.
When bit 3 of the control register is set, the hardware performs a TextureCopy-mode transfer: no format conversions are done, instead a raw data copy is performed from the source to the destination, with a configurable gap between lines. All bits of the control register are ignored, except for input/output dimensions, which are used for line width and gap, and bit 2, which must be set when gaps are used.
 
The total amount of bytes to copy is specified in the size register, the hardware loops reading lines from the input and writing them to the output until this amount is copied. The gap specifies the number of bytes to skip after each line read (a gap of 0 results in a contiguous read). Gaps do not count towards the total size of the transfer.
 
When setting line width and gap they must be divided by 2 (it can be thought as the calculation being done in bits, and the values being stripped of their lower 4 bits for the alignment). For example, if the left half of a 32x32 RGB8 texture is to be copied, the parameters will be:
line width = (16 * 24) >> 4 = 24
gap = line width
size = 16 * 32 * 3 = 1536


By correctly calculating the input and output gap sizes it is possible to use this functionality to copy arbitrary sub-rectangles between differently-sized framebuffers or textures, which is one of its main uses over a regular no-conversion DisplayTransfer. When copying tiled textures/framebuffers it's important to remember that the contents of a tile are laid out sequentially in memory, and so this should be taken into account when calculating the transfer parameters.
By correctly calculating the input and output gap sizes it is possible to use this functionality to copy arbitrary sub-rectangles between differently-sized framebuffers or textures, which is one of its main uses over a regular no-conversion DisplayTransfer. When copying tiled textures/framebuffers it's important to remember that the contents of a tile are laid out sequentially in memory, and so this should be taken into account when calculating the transfer parameters.


Specifying invalid/junk values for the TextureCopy dimensions can result in the GPU hanging while attempting to process this TextureCopy.
Specifying invalid/junk values for the TextureCopy dimensions can result in the GPU hanging while attempting to process this TextureCopy. For instance, when in contiguous mode the size must be at least 16; when in gap mode, the size must be at least 192, and the line width must not be 0.


== Command List ==
== Command List ==