# Multiply A * 48

Page 1/3
| 2 | 3

That's it, if I want to multiply A * 48 I need a 16 bit register as output; so this code is probably the obvious approach:

```ld b,0
ld c,a
ld h,b
ld l,c
ld b,h
ld c,l               ; BC = A * 48```

...but seems a bit heavy, 105Tstates

Has anyone found a faster solution?

Can’t think of anything faster than that other than using a lookup table.

```Multiply48:
ld l,a
ld h,Multiply48_lut >> 8
ld c,(hl)
inc h
ld b,(hl)
ret

ALIGN 100H
Multiply48_lut:
REPT 100H, ?i
db ?i & 0FFH
ENDM
REPT 100H, ?i
db ?i >> 8
ENDM
```

34 cycles at the expense of 512 bytes of memory.

Indeed, it has a huge speedup at a big RAM cost,; but really nice alternative, thanks Grauw!. Btw, these directives are Glass-compatible, isn't it?

I'm thinking on a pretty obvious optimization if I return the result in HL, saving the copy to BC (10 Tstates); but it can impact on how it's the result used afterwards...

jltursan wrote:

Indeed, it has a huge speedup at a big RAM cost,; but really nice alternative, thanks Grauw!. Btw, these directives are Glass-compatible, isn't it?

Yeah they are. Key point is to first have 256 entries with the LSBs of the result value, and then another 256 with the MSBs, so that you can calculate the index faster and use `inc h`.

As for the non-LUT multiplication, changing the input and output registers is indeed the obvious optimisation, and if the input range is limited to 0-85 then 8-bit additions could be used for the x3.

Maybe 11(+2) cycles slower version with 272-byte table?

```Multiply48:
ld l,a
ld h,Multiply48_lut >> 8
ld b,(hl)
inc h
and 15
ld l,a
ld c,(hl)
ret
```

I have no experience at all with this, but do these things make sense (just thinking a bit about it):
- 48 = 2 x 2 x 2 x 2 x 3 = left shift of 4, plus some adds?
- or: 48 = 32 x 1.5: left shift of 5 on A + one right shift of A?
I'm sure I'm stating the obvious here, but I'm interested to know why it's not a good solution

Manuel wrote:

I'm sure I'm stating the obvious here, but I'm interested to know why it's not a good solution

The original question was how to do it faster than *3*16 and there is absolutely nothing wrong with doing it like that... We just try to think, how to do it faster.

oh, repeated `add hl, hl` is faster than shifting?

Provided the value is in (hl)

```xor a
rld
ld l,(hl)
ld h,a
ld d,h
ld e,l
```

In 72 cycles. Not as fast, but no memory usage.

When operating with 16 bits register, yes, it's faster. As there're no 16-bits shifting instructions in the Z80, you'll need to use two 8-bit shifts through carry to left-shift HL; so in the end, it's slower by every single shift.

Maybe Grauw has guessed how is this routine being used, but yes, "A" values can be constrained between 0-31 (you've enough clues now to guess a specific use for the routine).

So, sacrificing the general purpose of the routine and changing a bit how the registers are used, I've ended with the following:

```ld l,a
ld h,0
ld l,a
add hl,hl		; HL = A * 48 (76T)```

Not bad, a bit faster and only A and HL modified.

Now, best of all, working with only 32 possible A values, LUT based routines can reduce significantly the used RAM

Manuel wrote:

oh, repeated `add hl, hl` is faster than shifting?

Yes, ADD HL,HL is 11(+1) cycles while SLA L & RL H is 16(+2) cycles.

Page 1/3
| 2 | 3