How to change the R18 value while the VDP is running a command

Page 1/2
| 2

By PingPong

Enlighted (4156)

PingPong's picture

24-12-2012, 11:56

I've done some investigations, and finally found that is possible.
1) changing r18 while i vblank DOES NOT WORK, there is a glitch
2) to change r18 safely one must be in HBLANK

this dirty code works on a plain msx2:

	ld a,2 			; read S#2
	out (0x99),a
	ld a,128+15
	out (0x99),a	;
    ld b, 3	
3:
	in	a,(0x99)		;wait until start of HBLANK
	and	0x20
	jp	z,3b
1:	in	a,(0x99)		;wait until start of HBLANK
	and	0x20
	jp	nz,1b
2:	in	a,(0x99)		;wait until end of HBLANK
	and	0x20
	jp	z,2b
	djnz 3b                     ; i loop three times, but i think it's not needed
	ld b, 11
4:                                     ; this delay help to adjust when to update the register
	djnz 4b
	ld a, 240                   ; visual indication
	out (0x99),a
	ld a, 7+128
	out (0x99),a

	ld a, (_r18Value)       ; do the job
	inc a                        
	and 15                     
	out (0x99), a
	ld a,128+18
	out (0x99), a
	
	ld a, 0
	out (0x99),a
	ld a, 7+128
	out (0x99),a

the code is only to demostrate that can be done.
this code is running while a cmd LMMV with xor logical operation is running in a infinite loop. There is no glitch

I've shared this because i think can be useful to know how the vdp works. But this code need to be refined a lot.
Most probably the vdp does a R18 sync at every scanline, regardless of VBLANK or not.
Anyway this command run in VBLANK.

Login or register to post comments

By PingPong

Enlighted (4156)

PingPong's picture

24-12-2012, 12:04

Please ignore this post. the glitch is only reduced. After a while it does appear again :-(

By hit9918

Prophet (2932)

hit9918's picture

24-12-2012, 19:50

Isn't a write to R18 is delayed till HBLANK by VDP itself?

My theory model:
At the start of the scanline, VDP loads R18 to a delay counter.
The whole display gear is delayed by this amount of cycles, display, sprites, and border.

Keyword "tail", what I meant by blitter action having a tail that hangs into next scanline:

What if one opcode interation of a command is a not interruptible unit.
Like, syncing to DMA pattern can only happen at start of an opcode.

Taking copy speed to be very roughly the speed of outi,
outi is 18 cpu cycles, on the 5.4Mhz dotclock it would be 12 pixels.

And then, if you change R18 from 15 to 0, from point of view of the DMA pattern of the new scanline, the blitter could have an additional 15 pixels "tail" just by that.

By hit9918

Prophet (2932)

hit9918's picture

24-12-2012, 20:29

So, another syncing hypothesis:
The blitter responds to a wait signal at the start of every command iteration.

At the start of a scanline, the display gear makes the wait pin high for some amount of cycles, and as that pin goes low again, blitter is synced properly.
Kind of a HBLANK pin for the blitter. Or maybe call it HWAIT.

And when blitter comes out of previous scanline with slower DMA pattern, with changed R18 it misses the event.

And it is not so clear where is the end of a scanline.
With a change of R18, from point of view of DMA patterns, a scanline can get shorter by 15 pixels!!!

And that is the thing blitter cant take when it is running its slow DMA pattern, that is the theory.

By PingPong

Enlighted (4156)

PingPong's picture

24-12-2012, 21:29

@hit9918. let's continue. First, the code pasted is wrong ( copy-paste error ). The real code is similar.
So let's continue with tests. I've done others investigations, but i was extremely confused about the results. I summarize here:
First, during interrupts i'm running in an infinite loop a copy operation (logical) from page 0 to page 1 with xor logical operation. So normally, you see the screen alternatively filled of 255 and 0 (255 xor 255, then 255 xor 0). that's is effectively what i get .
First test: at scanline 156 i fire a line int. my code does this:
(Can you check if i'm doing something wrong?)


void waitscanlines()
{
#asm
	ld c, 0x99
	ld d, 240
	ld e, 7+128
	ld a,2 			; read S#2
	out (0x99),a
	ld a,128+15
	out (0x99),a	;	
1:
	in	a,(0x99)		;wait until start of active area
	and	0x20
	jp	nz,1b

2:
	in	a,(0x99)		;wait until start of HBLANK
	and	0x20
	jp	z,2b

	#endasm
}

void scr_off() // turn off the screen 
{
#asm
	ld a, 0x22
	out (0x99),a
	ld a, 1+128
	out (0x99),a
#endasm
}


void scr_on() // turn on the screen
{
#asm
	ld a, 0x62
	out (0x99),a
	ld a, 1+128
	out (0x99),a
#endasm
}

void lineint()
{

	scr_off(); // screen off
	waitscanlines(); // wait x3, to be sure
	waitscanlines();
	waitscanlines();
	r18Value = (r18Value+1) & 15; // update r18 value, then write
	#asm
		ld a, (_r18Value)
		out (0x99),a
		ld a, 128+18
		out (0x99),a
	#endasm
	scr_on();	// screen  on
}


If i put this code in a line int, approximately at line 156 i get almost no corruption. After 30 seconds however, some random points start to appear. the strange thing is that the glitches are ALL placed at different Y values but ALL at X=0!
After a minute you have almost 10 of those glitches, apparently placed at random Y positions and all with X=0

By contrast, if the code is executed in VBLANK, corruptions are a LOT! in the first 10 seconds i get almost 1000 errors on the entire screen at random X, Y values.

The code is the same. I'm starting to get very confused. Shocked!
Teoretically the VBLANK should be a more safe place... Question

Any proposal or test ?

By hit9918

Prophet (2932)

hit9918's picture

25-12-2012, 11:05

In the other thread, Maggoo said that turning sprites on made something better.

One can imagine that having both display and sprite DMA on is better than having only one of them on.
Because seems to be a funny modulo game. The cycles an opcode takes in a certain DMA mode versus the situation it hits when a changed R18 makes the scanline shorter (from a DMA point of view).

Further, every time blitter draws the next line, it got a per-line overhead.
This adds further chaos to testing the issue.

The scroll maybe is a worst case with small width and R18 values changing most wildly (e.g. change from 15 to 0) ...

ey wait a minute. What if the score panel is at R18 position 15!?
Then a scroller above it will change R18 from 0->15, 1->15, ... 15->15.
i.e. the scanline always gets longer (DMA wise), not shorter.
Maybe this works without wreck.

However to make things sure, after the panel need to clear the situation with artrags method.
Because on the other side, transition from panel back to scroller, R18 again can go from 15 to 0.

By hit9918

Prophet (2932)

hit9918's picture

25-12-2012, 11:15

But for practical purposes, it doesn't matter a gap between a scroll and a panel, while parallax scroll wants things without gaps.

So I had hoped that the blanking fix works with just only turning sprites off and leaving display on.
But then I heard that this might make the situation even worse.

By PingPong

Enlighted (4156)

PingPong's picture

25-12-2012, 18:50

@hit9918: i've another theory.
my idea is that the blitter is only run during active area and in VBLANK. Not in HBLANK.
I think in HBLANK the vdp is extremely busy to allow both z80 & sprite SAT prefetch (to select 8 sprites to display over the 32 available). I guess that in HBLANK the blitter is stopped, then restarted and aligned to R18 just before going in active area.

During vblank, there are no sprites, so i suspect this mechanism is disabled. The blitter runs always, so changing R18 is more risky, because i think the blitter during VBLANK is not stopped then realigned then restarted.
This explain the more problems under VBLANK.
It also explain the "almost no corruption" changing during HBLANK. I've found a bug in my code, there is the possibility that i try to change R18 in active area. This can happens if the z80 delays to much to react to lineint. Because in the main loop of the program i use a sequence of DI-16xouti-EI to feed continuously the vdp with commands, there is a chance that some R18 updates falls into active area. Need to fix the routine and to verify.

By sd_snatcher

Prophet (3675)

sd_snatcher's picture

25-12-2012, 22:53

Since the approach of stopping the blitter, described here, doesn't work properly and the V/HBLANK approach is also giving you trouble, maybe a more scientific solution can be useful.

If we decompose the problem, we have two independent cases competing for the same resources:

1) To blitt tiles on the background buffer
2) To change the VDP R#18 and flip the buffers

And each case competes for the following resources, respectively:

1) CPU+blitter
2) CPU+VBLANK+(no corruption)

Note: The VDP R#18 and flipping the buffers requires vblank to avoid tearing.

Well, since we know that the only 100% sure case of "no corruption" is to have the blitter stopped, (2) can be expressed as:

2) CPU+VBLANK+(blitter stopped)

For those who studied computer science, you'll find that this problem is very similar to the dining philosophers problem, with 2 philosophers dining. So maybe we can try to use the known solution for such approach.

Since you're blitting tiles of either 8x8 or 16x16, the idea is to blit only a known number of tiles that the VDP is able to perform in less than one frame (one philosopher can't eat forever), then leave the rest of the frame time free for the CPU to change the VDP R#18 on the vertical interrupt. After changing the R#18 the vertical interrupt process will flag a semaphore that releases a new set of blittings by the main thread (outside the interrupt handling routine). The pseudocode would be like this:

[Initialization]
1) Set N to a given number of tiles, like n=20.
2) Release the semaphore so the Main-thread can start blitting

[Main thread]
1) Issue a set of n 8x8 (or 16x16) tiles to the blitter
2) Change the semaphore value. This means the main threat will wait until the next cycle
3) Wait for the semaphore value change, to allow it to process the next set of tiles
4) Loop to (1) to process the next set of tiles

[Interrupt Handler routine]
1) Checks the VDP blitter running status flag
2) If the blitter is still running, decrease the n value. This means that the next cycle of the main thread will run a smaller set of tiles; Wait until the blitter ends its job.
3) Change the VDP R#18 value and flip the buffers
4) Releases the semaphore to allow the main thread to continue blitting

This approach seems very easy to implement and also not prone to race conditions regardless of the blitter speed or CPU speed. For each time that the blitter is not able to perform the given set of tiles, it will automatically reduce the set of tiles for each frame, until the system reaches a balance. This balance will be reached very quickly, so only a few frames will be teared. This even allows the system to work well for normal and turbo CPUs or blitters.

By sd_snatcher

Prophet (3675)

sd_snatcher's picture

25-12-2012, 22:50

The above approach can be adjusted for the game frame rate, (i.e. 60fps, 30fps or 20fps) by inserting a frame dropper before the step (1) of the [Interrupt Handler routine].

By PingPong

Enlighted (4156)

PingPong's picture

26-12-2012, 00:31

sd_snatcher wrote:

The above approach can be adjusted for the game frame rate, (i.e. 60fps, 30fps or 20fps) by inserting a frame dropper before the step (1) of the [Interrupt Handler routine].

thx, for illustrating the entire process, but i think we want a constant 60 fps rate .... :-(

Page 1/2
| 2