Changes

→‎LCD Source Framebuffer Setup: Correct typo in mathematical calculation.
Line 2: Line 2:     
== Map ==
 
== Map ==
 +
Address mappings for the external registers. GSPGPU:WriteHWRegs takes these addresses relative to 0x1EB00000.
 
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
 
! User VA
 
! User VA
Line 8: Line 9:  
! Name
 
! Name
 
! Comments
 
! Comments
 +
|-
 +
| 0x1EF00000
 +
| 0x10400000
 +
| 4
 +
| Hardware ID
 +
| Bit2: new model
 
|-
 
|-
 
| 0x1EF00004
 
| 0x1EF00004
Line 30: Line 37:  
| 0x10400030
 
| 0x10400030
 
| 4
 
| 4
| ?
+
| VRAM bank control
|
+
| Bits 8-11 = bank[i] disabled; other bits are unused.
 
|-
 
|-
 
| 0x1EF00034
 
| 0x1EF00034
Line 37: Line 44:  
| 4
 
| 4
 
| GPU Busy
 
| GPU Busy
| Bit31 = cmd-list busy, bit27 = PSC0 busy, bit26 = PSC1 busy.
+
| Bit26 = PSC0, bit27 = PSC1, Bit30 = PPF, Bit31 = P3D
 
|-
 
|-
 
| 0x1EF00050
 
| 0x1EF00050
Line 50: Line 57:  
| ?
 
| ?
 
| Writes 0xFF2 on GPU init.
 
| Writes 0xFF2 on GPU init.
 +
|-
 +
| 0x1EF000C0
 +
| 0x104000C0
 +
| 4
 +
| Backlight control
 +
| Writes 0x0 to allow backlights to turn off, 0x20000000 to force them always on.
 
|-
 
|-
 
| 0x1EF00400
 
| 0x1EF00400
 
| 0x10400400
 
| 0x10400400
 
| 0x100
 
| 0x100
| [[#Framebuffer_Setup|Framebuffer Setup]] "PDC0" (top screen)
+
| [[#LCD Source Framebuffer Setup|Framebuffer Setup]] "PDC0" (top screen)
 
|
 
|
 
|-
 
|-
Line 60: Line 73:  
| 0x10400500
 
| 0x10400500
 
| 0x100
 
| 0x100
| [[#Framebuffer_Setup|Framebuffer Setup]] "PDC1" (bottom)
+
| [[#LCD Source Framebuffer Setup|Framebuffer Setup]] "PDC1" (bottom)
 
|
 
|
 
|-
 
|-
Line 68: Line 81:  
| [[#Transfer_Engine|Transfer Engine]] "DMA"
 
| [[#Transfer_Engine|Transfer Engine]] "DMA"
 
|
 
|
 +
|-
 +
|colspan="5"| 0x1EF01000/0x10401000 - 0x1EF01C00/0x10401C00 maps to [[GPU/Internal_Registers|GPU internal registers]]. These registers are usually not read/written directly here, but are written using the command list interface below (corresponding to the GPUREG_CMDBUF_* internal registers)
 
|-
 
|-
 
| 0x1EF01000
 
| 0x1EF01000
Line 129: Line 144:     
== LCD Source Framebuffer Setup ==
 
== LCD Source Framebuffer Setup ==
 +
 +
All of these registers must be accessed with 32bit operations regardless of the registers' actual bit size.
 +
 +
The naming of these parameters reflects the physical characteristics of the displays, and not the way the 3DS is normally held.
 +
 +
To make sense of these values, the 3DS must be held in a way, so that the bottom screen is in the left hand, and the top screen is in the right hand, and that way the first pixel will be in the top-left corner, as it should be. If the 3DS is held normally, the first pixel is in the bottom-left corner.
 +
 +
All pixel and scanline timing values are 12bits, unless noted. This also applies to those fields where two u16 are combined into one register. Each u16 field is only 12bits in size. timin
 +
 +
The horizontal timing parameter order is as follows (values may overflow through HTotal register value):
 +
0x10 < 0x14 <= 0x60.LO <= 0x04 <= 0x60.HI <= 0x08 <= 0x0C <= 0x10
 +
0x18 <= 0x60.LO
 +
 +
Timing starts from HCount == 0, then each absolute value in the beforementioned register chain triggers when HCount == register, latching the primitive display controller into a new mode.
 +
There is an inherent latch order, where if two simultenaous events occur, one event wins over another.
 +
 +
Known latched modes (in order):
 +
- HSync (triggers a line to the LCD to move to the next line)
 +
- Back porch (area between HSync and border being displayed, no pixels pushed, min 16 pixel clocks, otherwise the screen gets glitchy)
 +
- Left border start (no image data is being displayed, just a configurable solid color)
 +
- Image start (pixel data is being DMA'd from video memory or main RAM)
 +
- Right border start/Image end (border color is being displayed after the main image)
 +
- Unknown synchronization (supposed to be probably right border end, but this mode seems to be broken or not do anything)
 +
- Front porch (no pixels pushed, 68 clock min, otherwise the screen doesn't sync properly, and really glitches out)
 +
 
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
 
! Offset
 
! Offset
! Length
   
! Name
 
! Name
 
! Comments
 
! Comments
 +
|-
 +
| 0x00
 +
| HTotal
 +
| The total width of a timing scanline. In other words, this is the horizontal refresh clock divider value.
 +
 +
HClock = PClock / (HTotal + 1)
 +
|-
 +
| 0x04
 +
| HStart
 +
| Determines when the image is going to be displayed in the visible region (register 0x60).
 +
|-
 +
| 0x08
 +
| HBR
 +
| Right border start(?). Does nothing.
 +
 +
While this register seems to have no impact on the image whatsoever, it still has to be set to a valid value.
 +
|
 +
|-
 +
| 0x0C
 +
| HPF
 +
| Front porch. The image is blanked during this period, and no pixels are pushed to the LCD.
 +
 +
Unknown why, but a single dot of red is displayed before entering this mode.
 +
|-
 +
| 0x10
 +
| HSync
 +
| Triggers a HSync pulse.
 +
 +
Based on behavior, this needs to last at least a pixel clock for the LCD to register the sync.
 +
|-
 +
| 0x14
 +
| HPB
 +
| Back porch? Has to be at least one bigger than HSync, otherwise HSync never triggers.
 +
 +
The display is blank, and the LCD displays nothing in this period (doesn't push pixels).
 +
|-
 +
| 0x18
 +
| HBL
 +
| Left border trigger treshold. Enables pushing pixels to the display.
 +
 +
If this value is smaller than the back porch, then the back porch period will be zero, and the border will be immediately displayed upon entering the back porch period.
 +
 +
Can be lower than HSync, as the back porch is what takes the controller out of HSync.
 +
 +
Must be <= HDisp start (reg 0x60 low u16), otherwise no pixels will be pushed due to a glitched state.
 +
|-
 +
| 0x1C
 +
| H Interrupt timing
 +
| Made up from two u16 values, PDC interrupt line is asserted when HCount == low u16, and most likely deasserted when HCount == high u16.
 +
 +
There seems to be some limitations though:
 +
* low u16 must be smaller than high u16
 +
* if low u16 is less than HTotal then high u16 must also be smaller than HTotal
 +
* setting low u16 to >= HTotal disables the interrupt ever firing
 +
 +
This is configured by gsp in a way so that low u16 equals to HTotal, meaning the HSync interrupt will never fire.
 +
|-
 +
| 0x20
 +
| low u16: ???
 +
high u16: ???
 +
| ???
 +
|-
 +
| 0x24
 +
| VTotal
 +
| Total height of the timing window. Can be interpreted as the vertical clock divider.
 +
 +
VClock = PClock / (HTotal + 1) / (VTotal + 1)
 +
 +
Setting this to 494 lowers framerate to about 50.040660858 Hz ((268111856 / 24) / (450 + 1) / (494 + 1)).
 +
|-
 +
| 0x28
 +
| ?
 +
| Seems to determine the vertical blanking interval.
 +
 +
 +
Setting this to lower than <code>VTotal - VDisp</code> will cut off the top <code>VTotal - VDisp - thisvalue</code> lines.
 +
 +
Setting this to higher than <code>VTotal - VDisp</code> will make the image be pushed downwards with the overscan color visible.
 +
 +
Setting this to higher than <code>HTotal</code> will make the GPU skip vertical pixel data synchronization (hence filling the screen with the rest of the pixel data past the given screen framebuffer size). Also will skip <code>thisvalue + somevalue - HTotal</code> lines into the "global" pixel buffer.
 +
|-
 +
| 0x30
 +
| ?
 +
| Total amount of vertical scanlines in the pixel buffer, must be bigger than *an unknown blanking-like value*. If this value is less than VDisp then the last two scanlines will be repeated interlaced until VDisp is reached.
 +
|-
 +
| 0x34
 +
| VDisp(?)
 +
| Total amonut of vertical scanlines displayed (only for top screen it seems like). If this value is less than VTotal then the rest of the scanlines will not be updated on the screen, so those will slowly fade out. Must be bigger than *an unknown blanking-like value*, otherwise an underflow will happen.
 +
|-
 +
| 0x38
 +
| Vertical data offset(?)
 +
| ??? Seems to offset the screen upwards if this value is high enough. If this value is higher or equal to *some value* (aka. if less than one scanline is displayed on the screen) then the screen will lose synchronization.
 +
|-
 +
| 0x40
 +
| V Interrupt timing
 +
| Similar to H Interrupt timing (0x1C), except the comparison is done against VCount, the limitations are emposed on VTotal, and the interrupt that fires is VSync.
 +
 +
One important note is that it seems like the VSync interrupt always fires at HCount == 0, and there doesn't seem to be a register to control this behavior.
 +
|-
 +
| 0x44
 +
| ???
 +
| similar functionality to 0x10
 +
|-
 +
| 0x48
 +
| ???
 +
| bit0 seems to disable HSync, bit8 seems to disable VSync, rest of the bits aren't writable.
 +
|-
 +
| 0x4C
 +
| Overscan filler color
 +
| 24bits(? top 8bits ignored)
 +
 +
When the visible region is being drawn, but the timing parameters are set up in a way that the framebuffer is smaller than the visible region, it will be filled by this color.
 +
|-
 +
| 0x50
 +
| HCount
 +
| Horizontal "beam position" counter. Note that this value does not equal to the current pixel being drawn.
 +
|-
 +
| 0x54
 +
| VCount
 +
| Vertical "beam position" counter. Note that the scanline being drawn isn't equal to this value.
 
|-
 
|-
 
| 0x5C
 
| 0x5C
| 4
+
| ???
| Framebuffer width & height
+
| low u16: Image width (including some offset?)
| Lower 16 bits: width, upper 16 bits: height
+
high u16: Image height??? (seems to be unused)
 +
|-
 +
| 0x60
 +
| HDisp
 +
| low u16: Image start (border --> pixel data)
 +
high u16: Image end (pixel data --> border)
 +
|-
 +
| 0x64
 +
| ???
 +
| low u16: unknown
 +
high u16: framebuffer total height (amount of scanlines blitted regardless of framebuffer height)
 
|-
 
|-
 
| 0x68
 
| 0x68
| 4
   
| Framebuffer A first address
 
| Framebuffer A first address
 
| For top screen, this is the left eye 3D framebuffer.
 
| For top screen, this is the left eye 3D framebuffer.
 
|-
 
|-
 
| 0x6C
 
| 0x6C
| 4
   
| Framebuffer A second address
 
| Framebuffer A second address
 
| For top screen, this is the left eye 3D framebuffer.
 
| For top screen, this is the left eye 3D framebuffer.
 
|-
 
|-
 
| 0x70
 
| 0x70
| 4
+
| Framebuffer format and other settings
| Framebuffer format
+
| See [[#Framebuffer_format|framebuffer format]]
| Bit0-15: framebuffer format, bit16-31: unknown
+
|-
 +
| 0x74
 +
| PDC control
 +
| Bit 0: Enable display controller.
 +
Bit 8: HBlank IRQ mask (0 = enabled).
 +
Bit 9: VBlank IRQ mask (0 = enabled).
 +
Bit 10: Error IRQ mask? (0 = enabled).
 +
Bit 16: Output enable?
 
|-
 
|-
 
| 0x78
 
| 0x78
| 4
+
| Framebuffer select and status
| Framebuffer select
+
| Bit 0: Next framebuffer to display (after VBlank).
| Bit0: which framebuffer to display, bit1-7: unknown
+
Bit 4: Currently displaying framebuffer?
 +
Bit 8: Reset FIFO?
 +
Bit 16: HBlank IRQ status/ack. Write 1 to aknowledge.
 +
Bit 17: VBlank IRQ status/ack.
 +
Bit 18: Error IRQ status/ack?
 +
|-
 +
| 0x80
 +
| Color lookup table index select
 +
| 8bits, write-only
 +
|-
 +
| 0x84
 +
| Color lookup table indexed element
 +
| Contains the value of the color lookup table indexed by the above register, 24bits, RGB8 (0x00BBGGRR) 
 +
Accessing this register will increase the index register by one
 
|-
 
|-
 
| 0x90
 
| 0x90
| 4
   
| Framebuffer stride
 
| Framebuffer stride
| Distance in bytes between the start of two framebuffer rows (must be a multiple of 8).
+
| 32bits (bottom 3bits ignored?)
 +
 
 +
Distance in bytes between the start of two framebuffer rows (must be a multiple of 8).
 +
 
 +
In other words, this can be interpreted as the amount to add to the framebuffer pointer after displaying a scanline.
 +
 
 +
Setting this to zero will cause only the first line of the image to be displayed repeated on the entire display. With the HSync interrupt it's possible to "race the beam" to (ab)use this feature.
 +
 
 +
Because of this simplicity, writing a negative value here VFlips the image, although that requires the framebuffer pointer register to be set to the start of the last scanline, instead of at the start of the framebuffer.
 
|-
 
|-
 
| 0x94
 
| 0x94
| 4
   
| Framebuffer B first address
 
| Framebuffer B first address
| For top screen, this is the right eye 3D framebuffer. Unused for bottom screen.
+
| For top screen, this is the right eye 3D framebuffer. Unused for bottom screen in userland.
 
|-
 
|-
 
| 0x98
 
| 0x98
| 4
   
| Framebuffer B second address
 
| Framebuffer B second address
| For top screen, this is the right eye 3D framebuffer. Unused for bottom screen.
+
| For top screen, this is the right eye 3D framebuffer. Unused for bottom screen in userland.
 
|}
 
|}
   Line 183: Line 375:  
|-
 
|-
 
| 2-0
 
| 2-0
| Color format
+
| [[#Framebuffer_color_formats|Color format]]
 
|-
 
|-
| 3
+
| 5-4
| ?
+
| Framebuffer scanline output mode (framebuffer interleave config)
|-
+
 
| 4
+
0 - A  (output image as normal)
| Unused?
+
1 - AA (output a single line twice, so framebuffer A is interleaved with itself)
|-
+
2 - AB (interleave framebuffer A and framebuffer B)
| 5
+
3 - BA (same as above, but the line from framebuffer B is outputted first)
| Enable parallax barrier (i.e. 3D).
+
 
 +
0 is used by bottom screen at all times.
 +
1 is used by the top screen in 2D mode.
 +
2 is used by top screen in 3D mode.
 +
3 goes unused in userland.
 
|-
 
|-
 
| 6
 
| 6
| 1 = main screen, 0 = sub screen. However if bit5 is set, this bit is cleared.
+
| Scan doubling enable?* (used by top screen)
 
|-
 
|-
 
| 7
 
| 7
Line 201: Line 397:  
|-
 
|-
 
| 9-8
 
| 9-8
| Value 1 = unknown: get rid of rainbow strip on top of screen, 3 = unknown: black screen.
+
| DMA size
 +
 
 +
0 -  4 words (32 bytes)
 +
1 -  8 words (64 bytes)
 +
2 - 16 words (128 bytes)
 +
3 - ???
 +
 
 +
FCRAM doesn't support DMA size 3, as it can only burst up to 16 words (128 bytes), and will show a black screen instead.
 
|-
 
|-
| 15-10
+
| 31-16
| Unused?
+
| Unknown
 
|}
 
|}
 +
 +
* The weird thing about scan doubling, is that it works different between the bottom and top LCD. On the bottom LCD, it doubles the number of outputted pixels (so the same pixel is outputted twice, effectively doing column doubling). However on the top screen, it does scanline doubling instead. Considering that the bottom screen's table doesn't work on the top screen, this could give a hint as to how the top screen receives the pixel data from the PDC.
 +
On a 2DS, it seems to have no effect on the top part of the display, and on the bottom screen it just shifts the framebuffer to the right two pixels.
 +
    
GSP module only allows the LCD stereoscopy to be enabled when bit5=1 and bit6=0 here. When GSP module updates this register, GSP module will automatically disable the stereoscopy if those bits are not set for enabling stereoscopy.
 
GSP module only allows the LCD stereoscopy to be enabled when bit5=1 and bit6=0 here. When GSP module updates this register, GSP module will automatically disable the stereoscopy if those bits are not set for enabling stereoscopy.
 +
 +
 +
When both interlacing and scan doubling are disabled, the full resolution of the top screen (240x800) can be utilized if the PDC registers are updated to accomodate this higher resolution. GSP contains tables for this mode (gsp mode == 1). GSP automatically applies this mode if both bit5 and bit6 are cleared. This is also the default, and the only valid mode for the bottom screen in userland.
 +
 +
If only AB interlacing is enabled, gsp detects this as a request to switch to 3D mode (gsp mode == 2), and enables the parallax barrier.
 +
It's unknown how to control this, but some other PDC registers control if interlacing should be done by true interleaving (both framebuffers are treated as 240x400), or skipping lines (both framebuffers are treated as 240x800)
 +
 +
If only scan doubling is enabled, gsp detects it as a request to switch back to 2D mode for the top screen (gsp mode == 0). This is also the default mode for the top screen.
 +
 +
Both interlacing and scan doubling can't be enabled in usermode, but it works as expected in baremetal.
    
=== Framebuffer color formats ===
 
=== Framebuffer color formats ===
Line 231: Line 448:  
|}
 
|}
 
Color components are laid out in reverse byte order, with the most significant bits used first (i.e. non-24-bit pixels are stored as a little-endian values). For instance, a raw data stream of two GL_RGB565_OES pixels looks like GGGBBBBB RRRRRGGG GGGBBBBB RRRRRGGG.
 
Color components are laid out in reverse byte order, with the most significant bits used first (i.e. non-24-bit pixels are stored as a little-endian values). For instance, a raw data stream of two GL_RGB565_OES pixels looks like GGGBBBBB RRRRRGGG GGGBBBBB RRRRRGGG.
 +
 +
Color formats 5, 6, and 7 are blocked by gsp, but they behave as pixel-doubled RGBA8 (not line doubling, but instead the same pixel is output twice) if used outside of userland.
    
== Transfer Engine ==
 
== Transfer Engine ==
Line 238: Line 457:  
|-
 
|-
 
| 0x1EF00C00
 
| 0x1EF00C00
| Input physical address>>3
+
| Input physical address >> 3
 
|-
 
|-
 
| 0x1EF00C04
 
| 0x1EF00C04
| Output physical address>>3
+
| Output physical address >> 3
 
|-
 
|-
 
| 0x1EF00C08
 
| 0x1EF00C08
| Output framebuffer dimensions, used with cmd3
+
| DisplayTransfer output width (bits 0-15) and height (bits 16-31).
 
|-
 
|-
 
| 0x1EF00C0C
 
| 0x1EF00C0C
| Input framebuffer dimensions, used with cmd3
+
| DisplayTransfer input width and height.
 
|-
 
|-
 
| 0x1EF00C10
 
| 0x1EF00C10
| Flags, used with cmd3 and cmd4.
+
| Transfer flags. (See below)
 
|-
 
|-
 
| 0x1EF00C14
 
| 0x1EF00C14
Line 257: Line 476:  
| 0x1EF00C18
 
| 0x1EF00C18
 
|  Setting bit0 starts the transfer. Upon completion, bit0 is unset and bit8 is set.
 
|  Setting bit0 starts the transfer. Upon completion, bit0 is unset and bit8 is set.
 +
|-
 +
| 0x1EF00C1C
 +
|  ?
 
|-
 
|-
 
| 0x1EF00C20
 
| 0x1EF00C20
| Texture info? used with cmd4 ("Size" in [[GSP_Shared_Memory|GX command]])
+
| TextureCopy total amount of data to copy, in bytes.
 
|-
 
|-
 
| 0x1EF00C24
 
| 0x1EF00C24
| Texture info? used with cmd4 ("Input dimensions?" in [[GSP_Shared_Memory|GX command]])
+
| TextureCopy input line width (bits 0-15) and gap (bits 16-31), in 16 byte units.
 
|-
 
|-
 
| 0x1EF00C28
 
| 0x1EF00C28
| Texture info? used with cmd4 ("Output dimensions?" in [[GSP_Shared_Memory|GX command]])
+
| TextureCopy output line width and gap.
 
|}
 
|}
   −
These registers are used by [[GSP_Shared_Memory|GX command]] 3 and 4. For cmd4, *0x1EF00C18 |= 1 is used instead of just writing value 1. The dimensions fields seem to use the same format as [[LCD]] register 0x1EF00X5C. The input framebuffer width for the main screen is normally 480.
+
These registers are used by [[GSP_Shared_Memory|GX command]] 3 and 4. For cmd4, *0x1EF00C18 |= 1 is used instead of just writing value 1. The DisplayTransfer registers are only used if bit 3 of the flags is unset and ignored otherwise. The TextureCopy registers are likewise only used if bit 3 is set, and ignored otherwise.
   −
==== 0x1EF00C10 ====
+
==== Flags Register - 0x1EF00C10 ====
 
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
 
!  Bit
 
!  Bit
Line 282: Line 504:  
|-
 
|-
 
| 2
 
| 2
| This bit is set when the out-framebuf width/height is less than the input framebuf width/height, clear otherwise. Bit24 is normally clear when this bit is set.
+
| This bit is required when the output width is less than the input width for the hardware to properly crop the lines, otherwise the output will be mis-aligned.
 
|-
 
|-
 
| 3
 
| 3
| Uses a TextureCopy mode transfer. All other bits in this register seem to be ignored when this is set.
+
| Uses a TextureCopy mode transfer. See below for details.
 
|-
 
|-
 
| 4
 
| 4
Line 309: Line 531:  
|-
 
|-
 
| 16
 
| 16
| Use some kind of 32x32 block swizzling mode, instead of the usual 8x8 one.
+
| Use 32x32 block tiling mode, instead of the usual 8x8 one. Output dimensions must be multiples of 32, even if cropping with bit 2 set above.
 
|-
 
|-
 
| 17-23
 
| 17-23
Line 320: Line 542:  
| Not writable
 
| Not writable
 
|}
 
|}
 +
 +
=== TextureCopy ===
 +
 +
When bit 3 of the control register is set, the hardware performs a TextureCopy-mode transfer. In this mode, all other bits of the control register (except for bit 2, which still needs to be set correctly) and the regular dimension registers are ignored, and no format conversions are done. Instead, it performs a raw data copy from the source to the destination, but with a configurable gap between lines. The total amount of bytes to copy is specified in the size register, and the hardware loops reading lines from the input and writing them to the output until this amount is copied. The "gap" specified in the input/output dimension register is the number of chunks to skip after each "width" chunks of the input/output, and is NOT counted towards the total size of the transfer.
 +
 +
By correctly calculating the input and output gap sizes it is possible to use this functionality to copy arbitrary sub-rectangles between differently-sized framebuffers or textures, which is one of its main uses over a regular no-conversion DisplayTransfer. When copying tiled textures/framebuffers it's important to remember that the contents of a tile are laid out sequentially in memory, and so this should be taken into account when calculating the transfer parameters.
 +
 +
Specifying invalid/junk values for the TextureCopy dimensions can result in the GPU hanging while attempting to process this TextureCopy.
    
== Command List ==
 
== Command List ==
Line 336: Line 566:  
|}
 
|}
   −
These 3 registers are used by [[GSP_Shared_Memory|GX command]] 1. This is used for [[GPU_Commands|GPU commands]].
+
These 3 registers are used by [[GSP_Shared_Memory|GX command]] 1. This is used for [[GPU/Internal_Registers|GPU commands]].
    
== Framebuffers ==
 
== Framebuffers ==
23

edits