GSP Shared Memory: Difference between revisions

From 3dbrew
Jump to navigation Jump to search
Smea (talk | contribs)
Kynex7510 (talk | contribs)
m Use conventional names
 
(47 intermediate revisions by 10 users not shown)
Line 1: Line 1:
This page describes the structure of the GSP [[GSPGPU:RegisterInterruptRelayQueue|shared]] memory. GX commands and framebuffer info is stored here, and other unknown data.
This page describes the structure of the GSP [[GSPGPU:RegisterInterruptRelayQueue|shared]] memory. Interrupt, framebuffer, and GX command data is stored here.


=Framebuffer info=
=Interrupt Queue=
The framebuffer info structure for the main LCD is located at sharedmemvadr + 0x200 + threadindex*0x80. The framebuffer info structure for the sub LCD is located at sharedmemvadr + 0x240 + threadindex*0x80.
 
The Interrupt queue is located at sharedMemBase + (clientID * 0x40).
 
{| class="wikitable" border="1"
|-
!  Index Byte
!  Description
|-
| 0x0
| Offset from the count where to save incoming interrupts
|-
| 0x1
| Count (max 0x20 for PDC, 0x34 for others)
|-
| 0x2
| Missed other interrupts (set to 1 when 0 and count >= 0x34)
|-
| 0x3
| Flags (bit0 = skip PDC)
|-
| 0x4-0x7
| Missed PDC0 (incremented when flags.bit0 is clear and count >= 0x20)
|-
| 0x8-0xB
| Missed PDC1 (same as above)
|-
| 0xC-0x3F
| Interrupt list (u8) (0=PSC0, 1=PSC1, 2=PDC0/VBlankTop, 3=PDC1/VBlankBottom, 4=PPF, 5=P3D, 6=DMA)
|}
 
GSP fills the interrupt list, then triggers the event set with [[GSPGPU:RegisterInterruptRelayQueue|RegisterInterruptRelayQueue]] for the specified process(es).
 
PDC interrupts are sent to all processes; other interrupts are only sent to the process with rendering rights.
 
When issuing a [[GSP_Shared_Memory#Trigger_Memory_Fill|Memory Fill]] command with both buffers set GSP will only dispatch PSC0.
 
= Framebuffer Info =
 
The framebuffer info structure for the top LCD is located at sharedMemBase + 0x200 + (clientID * 0x80).
 
The framebuffer info structure for the bottom LCD is located at sharedMemBase + 0x240 + (clientID * 0x80).
 
== Framebuffer Info Header ==


==Framebuffer info header==
{| class="wikitable" border="1"
{| class="wikitable" border="1"
|-
|-
Line 14: Line 55:
|-
|-
| 1
| 1
| Flag
| Flags (bit0 = client has set new data)
|-
|-
| 3-2
| 3-2
| Padding
| Padding
|}
== Framebuffer Info Structure ==
{| class="wikitable" border="1"
|-
!  Index Word
!  Description
|-
| 0
| Active framebuffer (0 = first, 1 = second)
|-
| 1
| Left framebuffer VA
|-
| 2
| Right framebuffer VA (top screen only)
|-
| 3
| [[GPU/External_Registers#LCD_Source_Framebuffer_Setup|Stride]] (offset 0x90)
|-
| 4
| [[GPU/External_Registers#Framebuffer_format|Format]]
|-
| 5
| [[GPU/External_Registers#LCD_Source_Framebuffer_Setup|Status]] (offset 0x78)
|-
| 6
| ? ("Attribute")
|}
|}


Line 24: Line 94:
The two 0x1C-byte framebuffer info entries are located at framebufferinfo+4.
The two 0x1C-byte framebuffer info entries are located at framebufferinfo+4.


=3D Slider and 3D [[GSPGPU:SetLedForceOff|LED]]=
= 3D Slider and 3D [[GSPGPU:SetLedForceOff|LED]] =
 
See [[Configuration Memory]].
See [[Configuration Memory]].


=Command Buffer Header=
= GX Command Queue =
 
This command queue is located at sharedMemBase + 0x800 + (clientID * 0x200). It consists of an header followed by at most 15 command entries.
 
The queue header has the following structure:
 
{| class="wikitable" border="1"
{| class="wikitable" border="1"
|-
|-
Line 33: Line 109:
!  Description
!  Description
|-
|-
| 0
| 0
| Current command index. This index is updated by GSP module after loading the command data, right before the command is processed. When this index is updated by GSP module, the total commands field is decreased by one as well.
| Index of the command to process, this is incremented by GSP before handling the command
|-
|-
| 1
| 1
| Total commands to process, must not be value 0 when GSP module handles commands. This must be <=15 when writing a command to shared memory. This is incremented by the application when writing a command to shared memory, after increasing this value [[GSPGPU:TriggerCmdReqQueue|TriggerCmdReqQueue]] is only used if this field is value 1.
| Total commands to process, this is incremented by the application when adding the command to the queue, and decremented by GSP before handling the command
|-
|-
| 2
| 2
| Must not be value 1. When the error-code u32 is set, this u8 is set to value 0x80.
| Status (0x1 = halted, 0x80 = error)
|-
|-
| 3
| 3
| Bit0 must not be set
| When bit0 is set, further processing of commands is halted until the client resets the flag and calls [[GSPGPU:TriggerCmdReqQueue|TriggerCmdReqQueue]]
|-
|-
| 4
| 7-4
| u32 Error code for the last GX command which failed
| Result code for the last command which failed
|}
|}


The command buffer is located at sharedmem + 0x800 + [[GSPGPU:RegisterInterruptRelayQueue|threadindex]]*0x200. After writing the command data to shared memory, [[GSPGPU:TriggerCmdReqQueue|TriggerCmdReqQueue]] must be used to trigger GSP processing for the command when the total commands field is value 1.
After adding a command, [[GSPGPU:TriggerCmdReqQueue|TriggerCmdReqQueue]] must be used to start command processing (official code does so when the total commands field is 1).
 
== Commands ==
 
A command entry is made of 8 words. The first word is the command header, subsequent words represent command specific parameters.
 
The command header has the following structure:


=Command Header=
{| class="wikitable" border="1"
{| class="wikitable" border="1"
|-
|-
Line 57: Line 138:
!  Description
!  Description
|-
|-
| 0
| 0
| Command ID
| Command ID
|-
| 1
| Unused
|-
|-
| 2-1
| 2
| ?
| When bit0 is set, GSP stops processing further commands (can be used for packing together sets of commands)
|-
|-
| 3
| 3
| When non-zero GSP module may check flags for the specified cmdID, command handling is aborted when the flags are set. The corresponding flag for each CmdID is set once the command is handled by GSP module, this flag is likely cleared once the GPU finishes processing the command.
| When set, the command fails if GSP is busy handling any other command; otherwise, it only fails if GSP is busy handling a command of the same kind
|}
|}


The command is located at cmdbuf + 0x20 + cmdindex*0x20, the size of each command is 0x20-bytes. The command parameters are located at command+4. Addresses specified in parameters are application vaddrs, these are usually located in either the process GSP [[Memory_layout|heap]] or VRAM. For applications these addresses are normally located in the GSP heap, while for other processes these addresses are located in VRAM. Addresses/sizes specified in parameters except for cmd0 and cmd5 must be 8-byte [[GPU|aligned]].
Addresses specified in command parameters are virtual addresses. Depending on the command, there might be constraints on the accepted parameters. In general, some commands require parameters to be aligned, and addresses are expected to be on [[Memory_Management#Memory_Mapping|linear]], [[Memory_layout#0x1F000000_.28New_3DS_only.29|QTM]] or VRAM memory.


=Commands=
=== RequestDMA ===


==GX RequestDma==
{| class="wikitable" border="1"
{| class="wikitable" border="1"
|-
|-
Line 78: Line 161:
|-
|-
| 0
| 0
| u8 CommandID is 0x00
| Command header (ID = 0x00)
|-
|-
| 1
| 1
Line 89: Line 172:
| Size
| Size
|-
|-
| 7-4
| 6-4
| Unused
| Unused
|-
| 7
| Flush source (0 = don't flush, 1 = flush)
|}
|}


This command is normally used to DMA data from the application GSP [[Memory_layout|heap]] to VRAM.
This command issues a [[Corelink_DMA_Engines|DMA request]] as the process with [[GSPGPU:AcquireRight|rendering rights]]. When the destination address is within VRAM, GSP places itself as the destination process: this makes it possible to transfer data in VRAM without needing it listed in the destination process [[NCCH/Extended_Header#ARM11_Kernel_Capabilities|exheader mappings]]. Otherwise, both source and destination of the DMA request are the process with rendering rights.
 
The source buffer must be mapped as readable in the source process, while the destination address must be mapped as writable in the destination process, otherwise GSP calls [[SVC|svcBreak]]. When flushing is enabled and the source address is above VRAM, svcFlushProcessDataCache is used to flush the source buffer.
 
Any process must have acquired rendering rights, otherwise the command does nothing.
 
=== ProcessCommandList ===


==GX SetCommandList Last==
{| class="wikitable" border="1"
{| class="wikitable" border="1"
|-
|-
Line 102: Line 193:
|-
|-
| 0
| 0
| u8 CommandID is 0x01
| Command header (ID = 0x01)
|-
|-
| 1
| 1
Line 111: Line 202:
|-
|-
| 3
| 3
| Flag, bit0 is written to GSP module state
| Update gas additive blend results (0 = don't update, 1 = update)
|-
|-
| 6-4
| 6-4
Line 117: Line 208:
|-
|-
| 7
| 7
| When non-zero, call svcFlushProcessDataCache() with the specified buffer
| Flush buffer (0 = don't flush, 1 = flush)
|}
|}


This command converts the specified address to a physical address, then writes the physical address and size to the [[GPU]] registers at 0x1EF018E0. This buffer contains [[GPU_Commands|GPU commands]].
This command sets the [[GPU/External_Registers#Command_List|Command List registers]], and optionally updates gas additive blend results after command processing has ended.
 
No error checking is performed on the parameters. Address and size should be both aligned to 8 bytes, and the address should be in linear, QTM or VRAM memory, otherwise PA 0 is used. When flushing is enabled, [[SVC|svcFlushProcessDataCache]] is used to flush the buffer on the process that has acquired rendering rights.
 
Any process must have acquired rendering rights, otherwise the command does nothing.
 
=== MemoryFill ===


==GX SetMemoryFill==
{| class="wikitable" border="1"
{| class="wikitable" border="1"
|-
|-
Line 129: Line 225:
|-
|-
| 0
| 0
| u8 CommandID is 0x02
| Command header (ID = 0x02)
|-
|-
| 1
| 1
| Buf0 start address
| Buffer 0 start address
|-
|-
| 2
| 2
| Buf0 value
| Buffer 0 value
|-
|-
| 3
| 3
| Buf0 end address
| Buffer 0 end address
|-
|-
| 4
| 4
| Buf1 start address
| Buffer 1 start address
|-
|-
| 5
| 5
| Buf1 value
| Buffer 1 value
|-
|-
| 6
| 6
| Buf1 end address
| Buffer 1 end address
|-
|-
| 7
| 7
| The low u16 is width0, while the high u16 is width1 (?)
| Control0 <nowiki>|</nowiki> (Control1 << 16)
|}
|}


This commands converts the specified addresses to physical addresses, then writes these addresses and the specified parameters to the [[GPU]] registers at 0x1EF00010 and 0x1EF00020. Doing so fills the specified buffers with the associated 4-byte value. This is used to clear GPU framebuffers.
This command sets the [[GPU/External_Registers#Memory_Fill|Memory Fill registers]].
The associated buffer address must not be <= to the main buffer address, thus the associated buffer address must not be zero as well. When the bufX address is zero, processing for the bufX parameters is skipped.
 
Addresses should be aligned to 8 bytes and must be in linear, QTM or VRAM memory, otherwise error 0xE0E02BF5 (GSP_INVALID_ADDRESS) is returned. The start address for a buffer must be below its end address, else the same error is returned. If the start address for a buffer is 0, that buffer is skipped; otherwise, its relative PSC unit is used for the fill operation.
 
=== DisplayTransfer ===


==GX SetDisplayTransfer==
{| class="wikitable" border="1"
{| class="wikitable" border="1"
|-
|-
Line 163: Line 261:
|-
|-
| 0
| 0
| u8 CommandID is 0x03
| Command header (ID = 0x03)
|-
|-
| 1
| 1
| Input framebuffer address
| Source address
|-
|-
| 2
| 2
| Output framebuffer address
| Destination address
|-
|-
| 3
| 3
| Input framebuffer [[GPU|dimensions]]
| Source dimensions
|-
|-
| 4
| 4
| Output framebuffer dimensions
| Output dimensions
|-
|-
| 5
| 5
| [[GPU|Flags]], for applications this is 0x1001000 for the main screen, and 0x1000 for the sub screen.
| Flags
|-
|-
| 7-6
| 7-6
Line 184: Line 282:
|}
|}


This command converts the specified addresses to physical addresses, then writes these physical addresses and parameters to the [[GPU]] registers at 0x1EF00C00. This GPU command copies the already rendered framebuffer data from the input GPU framebuffer address to the specified output LCD framebuffer. The input framebuffer is normally located in VRAM. Note that unlike the LCD framebuffers, the GPU framebuffer seems to use fixed-point/floats for the color format.
This command sets the [[GPU/External_Registers#Transfer_Engine|Display Transfer registers]].
 
No error checking is performed on the parameters. Addresses should be aligned to 8 bytes and should be in linear, QTM or VRAM memory, otherwise PA 0 is used.
 
=== TextureCopy ===


==GX SetTextureCopy==
{| class="wikitable" border="1"
{| class="wikitable" border="1"
|-
|-
Line 193: Line 294:
|-
|-
| 0
| 0
| u8 CommandID is 0x04
| Command header (ID = 0x04)
|-
|-
| 1
| 1
| Input buffer address
| Source address
|-
|-
| 2
| 2
| Output buffer address
| Destination address
|-
|-
| 3
| 3
Line 205: Line 306:
|-
|-
| 4
| 4
| Input [[GPU|dimensions]]?
| Line width <nowiki>|</nowiki> (gap << 16)
|-
|-
| 5
| 5
| Output dimensions?
| Same as above, for the destination
|-
|-
| 6
| 6
| Flags, normally this is 0x8, with bit2 optionally set when either of the dimensions fields are set.
| Flags
|-
|-
| 7
| 7
Line 217: Line 318:
|}
|}


This command is similar to cmd3, this command also writes to the [[GPU]] registers at 0x1EF00C00.
This command sets the [[GPU/External_Registers#TextureCopy|Texture Copy registers]]. Note that GSP doesn't enforce bit3 of the flags to be set.
 
No error checking is performed on the parameters. Addresses and size should be aligned to 8 bytes, and the addresses should be in linear, QTM or VRAM memory, otherwise PA 0 is used.
 
=== FlushCacheRegions ===


==GX SetCommandList First ==
{| class="wikitable" border="1"
{| class="wikitable" border="1"
|-
|-
Line 226: Line 330:
|-
|-
| 0
| 0
| u8 CommandID is 0x05
| Command header (ID = 0x05)
|-
|-
| 1
| 1
| Buf0 address
| Buffer 0 address
|-
|-
| 2
| 2
| Buf0 size
| Buffer 0 size
|-
|-
| 3
| 3
| Buf1 address
| Buffer 1 address
|-
|-
| 4
| 4
| Buf1 size
| Buffer 1 size
|-
|-
| 5
| 5
| Buf2 address
| Buffer 2 address
|-
|-
| 6
| 6
| Buf2 size
| Buffer 2 size
|-
|-
| 7
| 7
Line 250: Line 354:
|}
|}


The application buffer addresses specified in the parameters are used with [[SVC|svcFlushProcessDataCache]]. The input buf0 size must not be zero. When buf1 size is zero, svcFlushProcessDataCache() for buf1 and buf2 are skipped. When buf2 size is zero, svcFlushProcessDataCache() for buf2 is skipped.
This command calls svcFlushProcessDataCache for each buffer on the process that has acquired rendering rights.
 
If any call fails, its error is returned; If any buffer has size 0, the buffer is skipped. In both cases, subsequent buffers are not processed.
 
Any process must have acquired rendering rights, otherwise the error 0xD8202A06 (GSP_NO_RIGHT) is returned.
 
== Bugs ==
 
* When issuing a DMA request, GSP attempts to acquire an internal semaphore that rules CDMA access; this semaphore is never released on failure paths. While this is generally not an issue, as GSP breaks on DMA failures, it becomes a problem if the DMA request is done with cache flushing: in that case, GSP will error silently, causing a deadlock in DMA code.
* When handling GX commands apart from RequestDMA and ProcessCommandList, GSP sets the relative busy flags in internal state before executing the commands. This means that, if the relevant interrupts are never triggered (eg. on invalid parameters), the busy flags never get reset, preventing execution of future commands of the same kind.

Latest revision as of 01:10, 30 May 2025

This page describes the structure of the GSP shared memory. Interrupt, framebuffer, and GX command data is stored here.

Interrupt Queue

The Interrupt queue is located at sharedMemBase + (clientID * 0x40).

Index Byte Description
0x0 Offset from the count where to save incoming interrupts
0x1 Count (max 0x20 for PDC, 0x34 for others)
0x2 Missed other interrupts (set to 1 when 0 and count >= 0x34)
0x3 Flags (bit0 = skip PDC)
0x4-0x7 Missed PDC0 (incremented when flags.bit0 is clear and count >= 0x20)
0x8-0xB Missed PDC1 (same as above)
0xC-0x3F Interrupt list (u8) (0=PSC0, 1=PSC1, 2=PDC0/VBlankTop, 3=PDC1/VBlankBottom, 4=PPF, 5=P3D, 6=DMA)

GSP fills the interrupt list, then triggers the event set with RegisterInterruptRelayQueue for the specified process(es).

PDC interrupts are sent to all processes; other interrupts are only sent to the process with rendering rights.

When issuing a Memory Fill command with both buffers set GSP will only dispatch PSC0.

Framebuffer Info

The framebuffer info structure for the top LCD is located at sharedMemBase + 0x200 + (clientID * 0x80).

The framebuffer info structure for the bottom LCD is located at sharedMemBase + 0x240 + (clientID * 0x80).

Framebuffer Info Header

Index Byte Description
0 Framebuffer info entry index
1 Flags (bit0 = client has set new data)
3-2 Padding

Framebuffer Info Structure

Index Word Description
0 Active framebuffer (0 = first, 1 = second)
1 Left framebuffer VA
2 Right framebuffer VA (top screen only)
3 Stride (offset 0x90)
4 Format
5 Status (offset 0x78)
6 ? ("Attribute")

When a process sets this framebuffer info, it sets index to (index+1) & 1. Then it writes the framebuffer info entry, and sets flag to value 1. The GSP module loads this framebuffer info entry data into GSP state once the GPU finishes processing GX commands 3 or 4. Once the GSP module finishes loading this framebuffer info, it sets flag to value 0, then it will not load the framebuffer info again until flag is value 1. After loading this entry data into GSP state, the GSP module then writes this framebuffer state to the LCD registers. GSP module automatically updates the LCD framebuffer registers each time GX commands 3 or 4 finish, even when this shared memory data was not updated by the application.(GSP module toggles the active framebuffer register when automatically updating LCD registers, when shared memory data is not used)

The two 0x1C-byte framebuffer info entries are located at framebufferinfo+4.

3D Slider and 3D LED

See Configuration Memory.

GX Command Queue

This command queue is located at sharedMemBase + 0x800 + (clientID * 0x200). It consists of an header followed by at most 15 command entries.

The queue header has the following structure:

Index Byte Description
0 Index of the command to process, this is incremented by GSP before handling the command
1 Total commands to process, this is incremented by the application when adding the command to the queue, and decremented by GSP before handling the command
2 Status (0x1 = halted, 0x80 = error)
3 When bit0 is set, further processing of commands is halted until the client resets the flag and calls TriggerCmdReqQueue
7-4 Result code for the last command which failed

After adding a command, TriggerCmdReqQueue must be used to start command processing (official code does so when the total commands field is 1).

Commands

A command entry is made of 8 words. The first word is the command header, subsequent words represent command specific parameters.

The command header has the following structure:

Index Byte Description
0 Command ID
1 Unused
2 When bit0 is set, GSP stops processing further commands (can be used for packing together sets of commands)
3 When set, the command fails if GSP is busy handling any other command; otherwise, it only fails if GSP is busy handling a command of the same kind

Addresses specified in command parameters are virtual addresses. Depending on the command, there might be constraints on the accepted parameters. In general, some commands require parameters to be aligned, and addresses are expected to be on linear, QTM or VRAM memory.

RequestDMA

Index Word Description
0 Command header (ID = 0x00)
1 Source address
2 Destination address
3 Size
6-4 Unused
7 Flush source (0 = don't flush, 1 = flush)

This command issues a DMA request as the process with rendering rights. When the destination address is within VRAM, GSP places itself as the destination process: this makes it possible to transfer data in VRAM without needing it listed in the destination process exheader mappings. Otherwise, both source and destination of the DMA request are the process with rendering rights.

The source buffer must be mapped as readable in the source process, while the destination address must be mapped as writable in the destination process, otherwise GSP calls svcBreak. When flushing is enabled and the source address is above VRAM, svcFlushProcessDataCache is used to flush the source buffer.

Any process must have acquired rendering rights, otherwise the command does nothing.

ProcessCommandList

Index Word Description
0 Command header (ID = 0x01)
1 Buffer address
2 Buffer size
3 Update gas additive blend results (0 = don't update, 1 = update)
6-4 Unused
7 Flush buffer (0 = don't flush, 1 = flush)

This command sets the Command List registers, and optionally updates gas additive blend results after command processing has ended.

No error checking is performed on the parameters. Address and size should be both aligned to 8 bytes, and the address should be in linear, QTM or VRAM memory, otherwise PA 0 is used. When flushing is enabled, svcFlushProcessDataCache is used to flush the buffer on the process that has acquired rendering rights.

Any process must have acquired rendering rights, otherwise the command does nothing.

MemoryFill

Index Word Description
0 Command header (ID = 0x02)
1 Buffer 0 start address
2 Buffer 0 value
3 Buffer 0 end address
4 Buffer 1 start address
5 Buffer 1 value
6 Buffer 1 end address
7 Control0 | (Control1 << 16)

This command sets the Memory Fill registers.

Addresses should be aligned to 8 bytes and must be in linear, QTM or VRAM memory, otherwise error 0xE0E02BF5 (GSP_INVALID_ADDRESS) is returned. The start address for a buffer must be below its end address, else the same error is returned. If the start address for a buffer is 0, that buffer is skipped; otherwise, its relative PSC unit is used for the fill operation.

DisplayTransfer

Index Word Description
0 Command header (ID = 0x03)
1 Source address
2 Destination address
3 Source dimensions
4 Output dimensions
5 Flags
7-6 Unused

This command sets the Display Transfer registers.

No error checking is performed on the parameters. Addresses should be aligned to 8 bytes and should be in linear, QTM or VRAM memory, otherwise PA 0 is used.

TextureCopy

Index Word Description
0 Command header (ID = 0x04)
1 Source address
2 Destination address
3 Size
4 Line width | (gap << 16)
5 Same as above, for the destination
6 Flags
7 Unused

This command sets the Texture Copy registers. Note that GSP doesn't enforce bit3 of the flags to be set.

No error checking is performed on the parameters. Addresses and size should be aligned to 8 bytes, and the addresses should be in linear, QTM or VRAM memory, otherwise PA 0 is used.

FlushCacheRegions

Index Word Description
0 Command header (ID = 0x05)
1 Buffer 0 address
2 Buffer 0 size
3 Buffer 1 address
4 Buffer 1 size
5 Buffer 2 address
6 Buffer 2 size
7 Unused

This command calls svcFlushProcessDataCache for each buffer on the process that has acquired rendering rights.

If any call fails, its error is returned; If any buffer has size 0, the buffer is skipped. In both cases, subsequent buffers are not processed.

Any process must have acquired rendering rights, otherwise the error 0xD8202A06 (GSP_NO_RIGHT) is returned.

Bugs

  • When issuing a DMA request, GSP attempts to acquire an internal semaphore that rules CDMA access; this semaphore is never released on failure paths. While this is generally not an issue, as GSP breaks on DMA failures, it becomes a problem if the DMA request is done with cache flushing: in that case, GSP will error silently, causing a deadlock in DMA code.
  • When handling GX commands apart from RequestDMA and ProcessCommandList, GSP sets the relative busy flags in internal state before executing the commands. This means that, if the relevant interrupts are never triggered (eg. on invalid parameters), the busy flags never get reset, preventing execution of future commands of the same kind.