Also amstrad CPC proved to successfully sync the z80 with video hw that was memory mapped.
would be nice to calculate for all those micros how in average was the drop % of speed compared to uncontended access.
From what I know, Amstrad suffers quite a lot. The problem is not in Z80. It is in DRAM and its 8 bit data bus. I.e. total RAS time mustn't be violated when the CRTC (display DMA) accesses the RAM together with Z80. Also performance wise, the execution speed is quite irregular. The finest SW requires the "timing alignment" with the wait state insertion mechanism and instruction flow optimization. IMHO it is possible for demo coders, but difficult for game design.
Amstrad designers were very well aware of the bus throughput problems. Therefore, the sprites introduced in PLUS model had no RAM access at all. All the sprite bitmaps and attributes were stored in the on chip memory inside the ASIC.
Later a question. Do you think it's possible to interface a VIC-II chip with a z80 based computer without loosing too much on cpu side?
IMHO yes, but Commodore did quite well with their C128. From what I've read/heard they have done some bus timing alignment. I can check this out, but also from what I know, Z80 was introduced there to run the commercial CP/M SW at speed. So, VIC-II (C128 version) is already well connected to Z80. To get better performance, the whole VIC-II memory subsystem redesign is required (going towards PSRAM, for example, or 16 bits).
- Slot selecting: They implemented main slots in a reasonable way, but for some unknown reason they made Subslot selecting with memory mapped I/O. This is still today the biggest single issue that causes software incompatibility and makes changing slots a real pain.
Yep, MSX slot mechanism is too complex. And what's more: irregular. You can have expanded slot, or not. One slot can be expanded, another not. Which makes for complex software to handle all cases.
Much better would have been like a single large mapper, with a cartridge slot simply being a 64K 'window' inside that address space. Preferably at a fixed location.
Flaw: on a 64kB MSX1 you can only use at most 28815 bytes for your basic program... that was quite uncompetitive with any competitor at the time.
+1. Totally unnecessary limitation if you're using bankswitching at all. For example 8K ROM + some mechanism to (temporarily) page in additional ROM blocks would have been easy to implement.
Hardware scrolling missing on MSX2. (This is a home computer!)
Never understood why the V9938 didn't offer full hw support for scrolling. With all the features included in the V9938, this would have been trivial to include. And very, very useful for software.
Also from what I gather, video bandwidth is reduced because of # of sprite-related data fetches from VRAM. Why not include sprite attributes in the VDP? Perhaps not the sprite bitmaps themselves, but at least sprite attributes in a set of internal VDP registers. Much improved VRAM access patterns, at the cost of very little VDP logic.
Also amstrad CPC proved to successfully sync the z80 with video hw that was memory mapped.
would be nice to calculate for all those micros how in average was the drop % of speed compared to uncontended access.
To be fair: if you don't want to waste CPU cycles on screen output, you need dedicated hw to generate the screen. And that hw needs priority access to the RAM that holds the screen data.
Whether that RAM is shared with CPU or not, isn't too relevant then. One way or the other: if the CPU wants to update screen data, and VDP needs to read data for screen output, the CPU will have to wait. Performance-wise, does it really matter then whether that's dedicated RAM attached to VDP or included in CPU's RAM space with video hw putting the CPU on hold?
VDP with dedicated RAM on I/O port(s) is one way. VRAM + CPU RAM in shared address space is another (doesn't Amiga do a similar thing?).
The only other way is dual-port RAM, which is 'non-standard' and very expensive back in the day.
Other than the above, imho the mayor issue was successive MSX standards not evolving with technological progress. MSX2 or 2+ should have been a new standard, with 16 or 32 bit CPU and much more powerful video / audio capabilities. Non-backwards compatible? Yep: no pain, no gain.
Also from what I gather, video bandwidth is reduced because of # of sprite-related data fetches from VRAM. Why not include sprite attributes in the VDP? Perhaps not the sprite bitmaps themselves, but at least sprite attributes in a set of internal VDP registers. Much improved VRAM access patterns, at the cost of very little VDP logic.
I think the CPU/VDP registers it is, the more complex it is and the more expensive it is.
@maxis: the z80 integration on c128 is not well done. pratically the cpu is clocked @ 4mhz but stopped for half the time (to emulate the sync mechanics that are easy to do on xx02 cpu). It's like a z80 @2mhz :-(
i was wondering if one can do better.
i assume that using busack/busrq /wait and the vic-ii signals this could be done but not sure about the sync problems due to timing of the vic/z80. Are the VIC-II signals early asserted to allow in the worst case the z80 to ack the busrq?
I've read that the vic assert 3 cycles early the BA & AEC lines that are to 6502 to allow the latter to ack. i do not know if this time is enough a for z80 too
@maxis: the z80 integration on c128 is not well done. pratically the cpu is clocked @ 4mhz but stopped for half the time (to emulate the sync mechanics that are easy to do on xx02 cpu). It's like a z80 @2mhz :-(
i was wondering if one can do better.
i assume that using busack/busrq /wait and the vic-ii signals this could be done but not sure about the sync problems due to timing of the vic/z80. Are the VIC-II signals early asserted to allow in the worst case the z80 to ack the busrq?
I've read that the vic assert 3 cycles early the BA & AEC lines that are to 6502 to allow the latter to ack. i do not know if this time is enough a for z80 too
See, it always can be done better. The problem is when it is "good enough", and this is not the engineering who decides, unfortunately.
So, here are 2 ways of doing faster access:
1. running at the full RAS cycle for the dynamic memory. Let's take those day standard 64Kx1 4164 chip. Let's take 4164-12. The full access read cycle is 230ns the fastest. We can round it up to 250ns for an easy calculation. I.e. with such a memory the maximum random access thoughput is 4 MBytes/s. If we split the cycles and synchronize - we have only 2MBytes/s for CPU and 2MBytes/s for the graphics. Z80 has irregular timing and no prefetch -> let's implement a small storage (two byte cache) to accelerate the program access.
This method was never used in the home computers. IMHO, too expensive.
2. Early in 80s the DRAM was already supporting the page mode for quite a while (actually starting from 1977)! So, the video controller could read the whole scan line into the single line buffer with the greater speed, since all the bytes are linear. Then, the memory is free for the CPU access. In the page mode, the access time within the same 256 bytes is almost 2 times faster, than in the random access mode.
Now, by using such a technique, the video throughput goes 2x higher and the CPU thoughput too! Just a small line memory to hold 512 bytes and another in ping-pong organization . And for the CPU it is enough to compare whether the next address still resides in the current 256 byte page.
This method was never used either in the home computers, since even more expensive. In some rare arcades, IMHO.
I don't recommend using BUSREQ/BUSACK since it takes more time, than the WAIT. Also the control bus doens't have to be liberated - too much time.
I don't recommend using BUSREQ/BUSACK since it takes more time, than the WAIT. Also the control bus doens't have to be liberated - too much time.
ok, but do you think was feasible to use those signals with the dram chip of the era?
i think on c128 they used those signals but the cpu is also stopped for 50% of time.
what i mean: 'it's possible using every drop of time and not the simplified approach of 50% of time stopped?'
Someone commented about slot expanders not being mapped to I/O ports being a flaw... The actual reason for that is simple:
If the slot expander were mapped as I/O, it would not be possible to have each slot expander retain it's state independently. Poking one would change the state of all others, exactly as it happens with the Memory Mapper. And that would be bad, not good as it would make the programs need to be more complex... Exactly the same reason why nobody (sane) made I/O mapped floppy drive interface cartridges.
So, guys... Instead of complaining, why not put some thought on why it was designed the way it were?
I am sure ya all will be able to come with very reasonable conclusions for each of those "flaws"...
Someone commented about slot expanders not being mapped to I/O ports being a flaw... The actual reason for that is simple:
If the slot expander were mapped as I/O, it would not be possible to have each slot expander retain it's state independently. Poking one would change the state of all others, exactly as it happens with the Memory Mapper. And that would be bad, not good as it would make the programs need to be more complex...
Yes, you can put my name on this...
Now each main slot has it's own sub-slot register indeed... So, what we can get out of this "optimization"?
If you ask me, the answer is "absolutely nothing!". It might make things more simple, if there would be standard that ie. BIOS, RAM and FDD needs to be in different main slots... Now there is no such a standard (that is good BTW) This means that every program has to make sure that both main I/O and sub I/O is changed every time you want to get access... and because SUB-Slot I/O is independent for each slot this makes the task next to hell and the optimizations that you suggest are pure bogus, since they can not be taken at any time if you want to keep compatibility between MSX computers. Ok. sure if you are good enough you can make a routine that makes a map about hardware and generates routines on the fly to give them maximum performance, but even in best case that makes programs incompatible (in speed) with other machines. End of story. No anything good can be gained from this design... or at least I challenge you to give any reasonable use case.
One reason I can think of is the way the slot management routine in BIOS is designed.
So you can call it directly and it act on the slots. The slot expander retains it's own state and you don't need to allocate a byte to cache it's state. The routine can just read the state register and act upon it's contents. If all slot expanders reacted to the same write at once, you would need to keep track of it's state when returning the control to a user program. It would make the programs more complex. The whole point is making it possible that each slot expander can remember it's own state and that writing on one won't disturb the others. The reason why the address is the last byte on memory is simply to make simpler for the logic design of the expander and programming reasons.
A lot of other things have similar reasoning, like for example how I/O addresses were picked up and why each I/O port address for onboard devices were allocated within a space of 8 addresses (for example how the PSG can be seen on the range A0h-A7h on most MSX1 computers) was saving logic and cutting cost down.
I know most things have a plausible reason to be, but subslot selection looks a lot like jugglery. You have to switch away from Z80 the place where it places it stack, poke something, then switch back the RAM slot and if you forget to Disable Interrupts you're doomed, because the PC Z80 tried to store at the top of the stack will go to the limbo, and when it tries to read it back it will probably get FFFF address and the next thing it will do is unpredictable.
OTOH, the slot features on MSX are maybe the best workaround for connecting several devices on such narrow address space of most 8-bit computers (Commodore cheated here, hard and ugly! They owned the 6502 chip factory, so they just created another 6502 with more address pins, but those address pins were for indirect addressing just like there was an bankswith circuitry outside the processor), so we may have a hard time finding a better solution without losing functionality.