any idea what is wrong with the unrolled loop?

Page 1/2
| 2

By xchip

Rookie (26)

xchip's picture

09-04-2022, 14:51

The rolled loop has been taken from MSXgl, it works fine. The unrolled loop shows corruption once every 3~4 frames on OpenMSX (https://youtu.be/ceRuUXIWX2M) , as if it was copying data to the wrong destination. It seens to work fine in emulicious and the webMSX emulator.

any tips on how to debug this are welcome!

Rolled version and working as expected:

void VDP_write_16K(u8 count, const u8* src) __sdcccall(1)
{
	src, count;
	__asm
		//exit if count is 0
        or a
        jr z, vdp_write_wrt16_exit_loop 
        ld		b, a				// count
        
		ld		c, #P_VDP_DATA	      
        ex      de, hl
       
        // Fast loop	        
	write_wrt16_loop_start:
		outi							// out(c) ; hl++ ; b--
		jp		nz, write_wrt16_loop_start
        
vdp_write_wrt16_exit_loop:

	__endasm;
}

Unrolled version

void VDP_write_16K_unrolled(u8 count, const u8* src) __sdcccall(1)
{
	src, count;
	__asm
		//exit if count is 0
        or a
        jr z, vdp_write_wrt16_exit_loop_unrolled 
        
		ld		c, #P_VDP_DATA	      
        ex      de, hl
       
        ld		d, a				// count

        // slow loop, individual outs	        
        and		a, #0x07 
        jr		z, write_wrt16_loop_start_unrolled
        ld		b,a
    write_wrt16_loop_start_1:        
        outi							// out(c) ; hl++ ; b--
        jp		nz, write_wrt16_loop_start_1
    
    write_wrt16_loop_start_unrolled:
        
        ld		a, d
        
        // fast loop, outs unrolled in blocks of 8	        
        and		a, #0xf8 
        jr		z, vdp_write_wrt16_exit_loop
        ld		b,a        
	write_wrt16_loop_start_2:
        .rept 8            
		outi							// out(c) ; hl++ ; b--
        .endm
		jp		nz, write_wrt16_loop_start_2
        
vdp_write_wrt16_exit_loop_unrolled:

	__endasm;
}
Login or register to post comments

By Grauw

Ascended (10605)

Grauw's picture

09-04-2022, 15:35

There is a speed limit for VRAM access on real hardware. It has to share VRAM access between the VDP display and CPU-VRAM interface, and for this there is a mechanism called “access slots”. There are a few free spaces in-between VDP display access, and every CPU-VRAM interface I/O will wait for the next free slot to write. However if you write faster than these slots are available, it will cause write corruption.

It is similar to how on the NES you can not access the VRAM except during vertical blanking. On MSX you can also access VRAM during display thanks to the access slots, but at a limited rate. Like the NES, on MSX, too, there is no limit during vertical blanking.

OpenMSX is the only emulator which emulates this currently, WebMSX does not and it seems Emulicious does not either.

By xchip

Rookie (26)

xchip's picture

09-04-2022, 15:53

Where can I read more about that? (BTW I tried some simple experiments to repro the corruption but I couldn't get it to glitch)

By Grauw

Ascended (10605)

Grauw's picture

09-04-2022, 16:07

The TMS9918 access timings in each screen mode are described in a simple table in the TMS9918 application manual on page 2-4 (PDF page 13).

Additionally, there is a high level of technical detail on the internal mechanism here:

V9938 VRAM timings
V9938 VRAM timings, part II

It’s about V9938 but also goes into TMS9918, and the concepts are similar.

I’m sure there are also previous discussions on the topic here on the forums and perhaps on the wiki, but I don’t have links ready for that.

By xchip

Rookie (26)

xchip's picture

09-04-2022, 16:45

Thanks, I didn't know about that! although I was suspecting something like that might be happening. BTW those pages had lots of details,

- Are there any simple rules of thumb I should follow?
- Any tricks to max out the writing speed without being too fast? I'd love to see some code Smile
- Anything useful I could do instead of waiting?

By aoineko

Champion (489)

aoineko's picture

09-04-2022, 18:53

At the top of vdp.h, you can found the timing table for each Screen mode (according to Grauw's MAP) in the worst case scenario (outside v-blank and sprite enabled) and the corresponding assembler functions.

For my basic version of MSXgl, I preferred to go with the worst case scenarios. Later, I plan to add defines to allow to choose one or more versions of the VRAM access functions depending on the desired access speeds.

By thegeps

Paragon (1035)

thegeps's picture

09-04-2022, 21:32

Outside the vblank you have almost perfect timing with this code:

vram_write_loop:
     outi
     jp nz,vram_write_loop

Inside the vblank you can unroll as much as you want

By xchip

Rookie (26)

xchip's picture

10-04-2022, 01:39

I am copying to VRAM right after the vdp interrupt, in theory I should be inside the vblank, shouldn't I?

By santiontanon

Paragon (1639)

santiontanon's picture

10-04-2022, 04:21

Are you trying to copy a lot of data? Even if you are inside of vblank, there is not that much time, and you can only copy a small amount of data before vblank is over (I calculated once how many bytes was safe to send at high speed, but I can't remember now haha).

By aoineko

Champion (489)

aoineko's picture

10-04-2022, 08:28

@xchip Set VDP_SetColor(8) at the begining if your copy and VDP_SetColor(1) at the end. If the red border leak at the top of the screen, it is that you are copying outside v-bkank.

By xchip

Rookie (26)

xchip's picture

10-04-2022, 17:59

Oh this what I was after, thanks!

Page 1/2
| 2