Accessing tables with Z80 ASM

Page 2/3
1 | | 3

By Edwin

Paragon (1182)

Edwin's picture

10-07-2006, 18:49

About IX and IY, does anyone actually use those in games? In the past few week I've looked at a few source codes and I've never seen those beeing used.

I used them frequently in U:U as well. I think any more complex project will benefit from them. In some places you just need the extra registers. I do tend to keep them out of loops though. I actually use the 8 bit versions (undocumented instructions) often as storage or counters when I'm out of other registers.

By ARTRAG

Enlighted (6862)

ARTRAG's picture

10-07-2006, 19:43

Actually many C compilers use IX or IY to manage the heap during function calls.

the heap is used both for local variables and for parameter exchange, so the
code of the function accesses to the local variables by many LD A,(IX+offest)

Also automatic variables are stored on the heap and accessed by LD A,(IX+offest)

By PingPong

Prophet (3900)

PingPong's picture

10-07-2006, 20:17

Actually many C compilers use IX or IY to manage the heap during function calls.

the heap is used both for local variables and for parameter exchange, so the
code of the function accesses to the local variables by many LD A,(IX+offest)

Also automatic variables are stored on the heap and accessed by LD A,(IX+offest)

@tokumaru: you need to switch your coding style when passing from 6502 assembly to z80 assembly.
If you try to code z80 with 6502 style in mind, this may work, but you get poor performances.

Do not think that IX,IY register are useless, as said by ARTRAG they are quite useful to quickly address some fixed structures like this (in 'C')

struct mystruct
{
BYTE data;
int offset;
char* p;
};

these can be quickly 'mapped' in : (for example)
ld ix, mystructaddr
ld l,(ix+1)
ld h,(ix+2) ; hl = value of offset

You are right about post or pre addressing feature of 6502 (that is a true gem), but remember, that can only work in a 256 byte boundary and when crossing the boundary you come back to start (and get an extra cycle delay for that boundary crossing). So you can use lookup tables only in page 0 (first 256 bytes), since the 6502 lacks a way to specify the base address.
Instead with z80 you can use H as high order address and l as low order byte. This allow you to virtually relocate your lookup table anywhere in (64K). Paying some extra you can also get a lookup table > than 256 bytes.

Regarding speed: I do not know anything on the platform where you had programmed 6502, but remember this :

C64 had 6510 (6502 derivative) inside,
MSX/CPC/ZX Spectrum had z80.

Typically sw developed on z80 outperform the 6502 in all cases where a computational power was requested (all 3d isometric games/chess games were quicker on zx than on c64, generally).

If you try to implement multiplication via lookup table on z80 you do not get the best results. It is faster using some public domain routine available on "net".

The main important thing is to made a "switch" in thinking a algorithm. As a general rule when coding i can give you these hints:

When you choose a register to hold a frequently used value take in mind these rules:

Do not choose 'A'. It is the register that have the greater chance to be destroyed, because several istructions works only using it ( for example addition. You cannot add b to c having the result in b, one of the register should be 'A')

When doing memory access use First DE then BC then HL for the same reasons, if you run out of registers consider using also IX,IY or swithing register banks. Almost in all cases all tecniques works quicker or at least at the same speed of doing memory accesses in 6502 fashion.

The main problem for you that come from 6202 assembly is that the z80asm is not ortogonal as almost 6502 is.
You will find some difficulties initially because of this.

By tokumaru

Expert (83)

tokumaru's picture

10-07-2006, 22:30

@tokumaru: you need to switch your coding style when passing from 6502 assembly to z80 assembly.
If you try to code z80 with 6502 style in mind, this may work, but you get poor performances.

Yeah, I know that. I do want to be able to output Z80 code fluently, as I can with 6502. In fact, this was one of my concerns when I started learning the Z80, I was afraid of forgeting the 6502 way of things. Hope I can manage both! =)

Do not think that IX,IY register are useless, as said by ARTRAG they are quite useful to quickly address some fixed structures like this (in 'C')
I was just arguing the "quickly" part, as the instructions that make use of IX and IY are quite slow.

You are right about post or pre addressing feature of 6502 (that is a true gem), but remember, that can only work in a 256 byte boundary and when crossing the boundary you come back to start (and get an extra cycle delay for that boundary crossing). So you can use lookup tables only in page 0 (first 256 bytes), since the 6502 lacks a way to specify the base address.
I can see advantages when it comes to indexed addressing in both processors, but I can also see flaws in both of them. Neither is perfect.

Instead with z80 you can use H as high order address and l as low order byte. This allow you to virtually relocate your lookup table anywhere in (64K). Paying some extra you can also get a lookup table > than 256 bytes.
I just regret that when using hl to access tables you almost always have to perform an addition. The way you mentioned will only work when loading 1-byte values. For more than that you'll either have to INC L to access each byte (worse, since you'd have to multiply the original index by the size of each entry to find the first byte of the value) or you could split the table in multiple ones, containing all the first bytes, then the second bytes, etc and INC H to move from one table to the other. Either way you have to do some math.

Regarding speed: I do not know anything on the platform where you had programmed 6502, but remember this :

C64 had 6510 (6502 derivative) inside,
MSX/CPC/ZX Spectrum had z80.

Typically sw developed on z80 outperform the 6502 in all cases where a computational power was requested (all 3d isometric games/chess games were quicker on zx than on c64, generally).
I code for the NES (Nintendo Entertainment System). The clock of it's processor is about half that of the MSX's Z80. That would compensate the fact that Z80 instructions use many more cycles. Also, the Z80 has much more complex instructions, while 6502 ASM is always about tiny little steps. The Z80 can have an instruction that copies this to that, decrements that and increments this and such complex instructions, the 6502 could never do that.

So, yeah, I guess that the Z80 can do more than the 6502, I'm not arguing that. I just miss some features I liked! =)

If you try to implement multiplication via lookup table on z80 you do not get the best results. It is faster using some public domain routine available on "net".
I like to build my own routines and understand 100% of what I'm using in my code. I don't feel comfortable using other people's code. I'm a control freak! O.o

The main important thing is to made a "switch" in thinking a algorithm.
I agree with you. I'm training to be able to do just that! Thank you for the tips. It is nice how with the Z80 you can do pretty complex tasks using only the registers and not touching the memory at all. The simplest tasks in 6502 require you to use memory as a temporary medium. Not that I'm complaining, I like both CPU's.

By dvik

Prophet (2200)

dvik's picture

10-07-2006, 23:02

Using both registry banks (the exx instruction) could speed things up a lot. I almost never use the IX and IY registers because they are very slow.

In contrast to what PingPong said, I prefer using hl as the address register. Then you can load the values into registers other than a. But I try to stay away from 16 bit arithmetic operations and when I need them I move the contents of hl into de using ex de,hl. Another benifit of having the address stored in hl is that you can do arithmetic operations directly with the value pointed to by hl, e.g. add (hl) which adds the value pointed to by hl with the value of the accumulator.

Another good idea is to set up your tables so you can do the equivalent of a = *ptr++ instead of indexed lookup. This often saves a lot. I used tables quite a lot in MSX Unleashed and other demos but they are structured quite different from what they would be in a C64 demo.

But the overall biggest saving you can do is to utilize all registers in both register banks and try to stay away from the IX and IY registers.

By PingPong

Prophet (3900)

PingPong's picture

11-07-2006, 02:03

Using both registry banks (the exx instruction) could speed things up a lot. I almost never use the IX and IY registers because they are very slow.

In contrast to what PingPong said, I prefer using hl as the address register.

But the overall biggest saving you can do is to utilize all registers in both register banks and try to stay away from the IX and IY registers.

most times having addresses stored in hl force us to do some math if hl serve as a base pointer. As in this example:

ld d,0
ld hl,xxxxxxx
push hl
ld e,someoffset
add hl,de
ld (hl),a ; store
pop hl
ld e,someoffset+2
add hl,de
ld (hl),a ; store

takes 88 cycles.

ld ix,xxxxxxxx
ld (ix+someoffet),a ; store
ld (ix+someoffset+2),a ; store

takes 52 cycles.

(I'm not assuming the m1 wait cycle)

As we can see IX / IY register are not so bad as we can think reading only timings in datasheets, surely depends on what should be done.
However, most times we need to to pseudo random access based on a value and because of this using hl and doing math on it has the following disavantages:

-Part of the fast access of HL is eat by the needing of loading some other register to do math and by the math operation itself.
-We destroy the previous value of HL, and we need some extra code to save it in the case we need back the original value.
-We do 16 bit math, slower than 8 bit math
-Because of 16 bit math we rapidly eat all available z80 register because the z80 use those register in pair (==> therefore had half the register available). Because of this we run more rapidly out of registers causing the need to save/restore registers in memory, and falling in poorer performance results than if we simply use all register including IX,IY

By PingPong

Prophet (3900)

PingPong's picture

11-07-2006, 02:18

@tokumaru:
Far by me the idea to compare the cpu performances, (Make sense, comparing two snails to see what is less snail!?, both are snails!). I would only point that most of the times things are not as easy appear at first approach. We rarely can say:

I hate this way, the second is ALWAYS FAR BETTER. Not at all, depends.

Regarding speed: you will find the z80 a bit slower if you had coding on 6502 running at half of z80 clock. To be honest, because of architectural of z80, a 6502 running at 1Mhz perform like a z80 @2.5Mhz (also here depends, but is a roungly exact value ).

What i liked on 6502 was mainly:
-ortogonal asm
-sofisticated addressing modes (post & pre), that are very cool. (if they made 16 bit wide they would be perfect!)

By tokumaru

Expert (83)

tokumaru's picture

11-07-2006, 02:28

(if they made 16 bit wide they would be perfect!)
The 65816 (CPU used in the SNES) has them, and it's backwards compatible with the 6502, AFAIK.

I think all these old processors (used in 8 and 16-bit gaming systems) are fascinating.

By jltursan

Prophet (2619)

jltursan's picture

11-07-2006, 10:04

I think all these old processors (used in 8 and 16-bit gaming systems) are fascinating.

Indeed!, I love the Z80; but the 6809 beats it in every aspect. Its a shame that it never breaks with success the 2Mhz barrier, very few variants were produced above this speed Sad

By GhostwriterP

Nemesis (666)

GhostwriterP's picture

11-07-2006, 10:06

I use ix and iy a lot. All extra regs are welcome. They are still faster than ld a,(hhll). Also very
usefull is the jp (iy) instruction, wich is faster aswell. Un-aligned criss cross table acces, again
a verry usefull feature. Not to mention you can now use hl for math. Even load data directly
into a reg, like ld l,(ix+0) ld h,(ix+1) even frees de for other stuff.
You can add or substract data and it takes the same amount of cylces to load (only 19 Tongue).
All and all they are very useful.

Page 2/3
1 | | 3