Fray: MSX2 vs TurboR

Página 1/2
| 2

Por PingPong

Enlighted (4136)

Imagen del PingPong

12-07-2021, 01:14

hi,I've seen two versions of fray, one for msx2 the other for turboR

I wondering if the TurboR is somewhat optimized in some way vs msx2 one.
I've noticed that the "window scroll" on the TurboR is way faster than msx2 version, this is unclear to me, because on the infamous I/O vdp access. I guess the engine will do a bunch of outi instructions skipping the unchanged tiles between frames. But if my guess is correct, assuming that the bottleneck is the vdp blitter speed or the OUTI 8us brake the speed should be better on z80 than on R800 because:
a) the vdp raw blitter speed is the same
b) the time needed to perform a outi filling vdp blitter registers is far shorter on msx2 that does not have 8us delay

so
- if the vdp blitter speed is the bottleneck, the scroll speed should be the same
- if the vdp port I/O access delay of 8us is the bottleneck the R800 should perform worse than msx2

however the evidence is :
on R800 is more faster than on msx2.

this make me think that on msx2 the vdp blitter is "under used".
Anyone could confirm my assumptions?
IF this is the situation maybe there is room to optimize the window scroll on msx2 to squeeze at maximum from the vdp blitter ? (meaning making the vdp on 100% usage)

maybe there are others game engines that show this issue?

Login sesión o register para postear comentarios

Por gdx

Enlighted (6213)

Imagen del gdx

12-07-2021, 02:06

VDP speed is the same but CPU is very faster. When VDP works, CPU also works. When the CPU works more than VDP there slowdown. And maybe the interrups routine is probably too long sometimes.

Por alexito

Paladin (761)

Imagen del alexito

12-07-2021, 02:33

I was reading the code of some parts of the game and what I had discovered --> the extra 256KB RAM was used to hold more graphics and strings of audio ADPCM samples where the R800 CPU speed was mainly charged to decode the ADPCM data and output it through the PCM DAC.

Por GhostwriterP

Paladin (683)

Imagen del GhostwriterP

12-07-2021, 10:02

Isn't the difference in scrolling speed just a matter of R800 being faster in sorting out what tiles need update and which remain the same?

Por gdx

Enlighted (6213)

Imagen del gdx

12-07-2021, 10:25

Yes, this is what I say above in a more general way.

Por GhostwriterP

Paladin (683)

Imagen del GhostwriterP

12-07-2021, 10:51

I never built an engine like this but from what I recon is that other than for waiting for the blitter also big deal of time is devoted to game logic and maintaining "name tables" for the different layers (background, foreground, sprites) and "cell update" flags. Not all of this can be easily interleaved with the blitting process.

Por PingPong

Enlighted (4136)

Imagen del PingPong

12-07-2021, 16:04

in other words you are telling us that the z80 could not saturate the vdp due to processing which tiles need to be updated and/or other game logic while the R800 being more faster can ?

So pratically the VDP is mainly "Free".
would be good to test this on openmSX with the tcl script to see how busy is kept the vdp in z80 and R800 mode.

Por Grauw

Ascended (10768)

Imagen del Grauw

12-07-2021, 16:21

In theory one could write code that perfectly parallelises the CPU and VDP, however in practice this is almost impossible except for in demoscene-like precisely timed and interleaved code. One of the reasons for this is that the VDP has no command completion interrupt, so it is difficult to optimally utilise both the CPU and the VDP. Therefore all games will see a speed-up on 7 MHz and R800.

So the proposition that if a game runs faster in turbo modes it must be not well optimised because the VDP should be the bottleneck, I don’t think it is a valid one. That is not to say of course that a game like Fray can’t be optimised, probably it can given enough time and effort. But perfectly interleaving the code such that the VDP is never idle is very difficult to do.

Por inchl

Expert (83)

Imagen del inchl

12-07-2021, 16:52

One option is to select status register #2 by default and check the ce-bit during some frequently called code. You need to use interrupt mode 2 for this since you require to read/clear status register 0 on the vblank. The polling overhead is minimal this way. Also a tick is to setup multiple screensplits like in MMM and perform the check there :-)

in a,(#99)
rra
call nc,processVdpQueue

Por PingPong

Enlighted (4136)

Imagen del PingPong

12-07-2021, 17:31

The command finished interrupt feature is somewhat that is missing, for sure, (and i think would have cost a very little effort to have ) however in this situation, if the z80 cannot feed to vdp a decent workload to keep busy, i think that even a interrupt feature is useless.

Here the problem is that the z80 had to work to much to see if the vdp need to redraw or not a specific tile.
So assuming the code is already optimized, if the z80 can give, say a tile command rate of 10 tiles x second and the vdp is able to keep up even at 15 tiles x seconds, there would be no gain in an command interrupt flag, because the z80 isn't wasting time to check if vdp is busy. It is already free every time the z80 has to feed a command to vdp.

Por Grauw

Ascended (10768)

Imagen del Grauw

12-07-2021, 18:27

The core issue with bitmap games using 8x8 tiles is that to optimise the gameplay speed you want to avoid unnecessary VDP copies. However when the gameplay area is 256x128, you need to process 512 tiles which is pretty rough on the Z80 already.

Let’s assume the game renders to a tile map buffer, which is then drawn in another pass which compares it to the previous buffer and issues a draw command if needed. Let’s assume it can check about 10 tiles while the VDP draws one. When two adjacent tiles have changed the CPU needs to wait for the VDP before it can draw the second tile, while if a bunch of tiles are unchanged the VDP will be idling until the next changed tile. In the latter case the turbo CPU improves performance.

With a command completion interrupt you could queue up copies so that the CPU doesn’t need to stall waiting for the VDP, the VDP can empty the queue while the CPU is processing unchanged tiles, so it can use both the CPU and VDP reasonably optimally. And when the CPU is done it could move on to other tasks while the VDP is still running. Although there’s still a cost to queueing as well as interrupt handling, a command completion interrupt would make CPU-VDP parallelisation much easier.

I’m not really complaining that we don’t have such an interrupt (no use crying over spilt milk), but without it’s nigh impossible to fully parallelise the CPU and VDP workload. Already a 50% parallelisation is quite an achievement I think.

A game also has to perform many other tasks related to controls, collision, enemies, attacks, script events, status UI, music, sound effects, etc. which is mostly CPU while the VDP will be most likely idle during those unless you have some large long-running copy that you can do in the mean while. This will of course also speed up significantly with a turbo CPU.

inchl wrote:

One option is to select status register #2 by default and check the ce-bit during some frequently called code. You need to use interrupt mode 2 for this since you require to read/clear status register 0 on the vblank. The polling overhead is minimal this way. Also a tick is to setup multiple screensplits like in MMM and perform the check there :-)

in a,(#99)
rra
call nc,processVdpQueue

That is a possible approach, but I don’t like the prospect of littering that code fragment througout every subsystem of the game. Also you’d need to carefully balance between checking too often and too little, and for all that effort you still can’t reach 100% VDP utilisation… Maybe localise it to the tile drawing system only just to absorb some of the irregularities in changed tiles, though implementing the queue also comes at a CPU cost so not sure if it’d be worth it in the end.

Página 1/2
| 2