Assembly Z80, best way to divide by 16

Página 2/2
1 |

Por santiontanon

Paragon (1830)

Imagen del santiontanon

20-09-2022, 23:10

Ah, indeed, I disallowed numeric constants to prune the space. I can try allowing 8 bit constants, but that'll take forever haha.

And indeed, I'll try it in the unpacker! If it's already been optimized, I doubt anything would be found, but let me give it a try! Smile

Por santiontanon

Paragon (1830)

Imagen del santiontanon

21-09-2022, 04:42

Ah, yes, allowing constants: 0, 1, 2, 3, 4, 7, 8, 15, 16, 31, 32, 63, 64, 127, 128, 240, 255

I get this in about 10 seconds, which is pretty much albs_br's second solution (other variations are found, but they are all equivalent):

ld a, h
rlca
rlca
rlca
rlca
and 15

Allowing for all constants between 0 - 255, I left it running for 5 minutes and it had not yet finished so, I stopped it hehe

Por [WYZ]

Champion (451)

Imagen del [WYZ]

21-09-2022, 09:09

are RRD or RLD useful here?

Ops, already thought by TheNestruo & GDX.

Por Micha

Expert (110)

Imagen del Micha

21-09-2022, 10:20

The fastest way I can think of would be utterly impractical and bizarre, but takes only 16 cycles (machine + T1) :

ld l,0
ld a,(hl)

Disadvantages:
- it needs an absurd "lookup table" scattered through the complete 64k of memory; the values of the table will be 256 bytes apart from eachother at &0000, &0100, &0200, .... &FF00
- register l is not preserved

I would personally go for the lookup table as proposed in the first post of this topic...

Por theNestruo

Champion (429)

Imagen del theNestruo

21-09-2022, 10:54

Micha wrote:

The fastest way I can think of would be utterly impractical and bizarre, but takes only 16 cycles (machine + T1) :

ld l,0
ld a, (hl)

Disadvantages:
- it needs an absurd "lookup table" scattered through the complete 64k of memory; the values of the table will be 256 bytes apart from eachother at &0000, &0100, &0200, .... &FF00
- register l is not preserved

I would personally go for the lookup table as proposed in the first post of this topic...

I see your utterly impractical and bizarre way and raise the bet (i.e.: it's faster, but even more impractical!):

ld l, h
ld a,(hl)

The LUT would be now at $0000, $0101, $0202, ..., $fefe, $ffff (oops! you cannot divide $ff)

Por bore

Master (182)

Imagen del bore

21-09-2022, 11:57

While we are looking at impractical solutions, how about changing the way the value is represented?

	ld	a, h	; convert to 4.4 fixpoint and divide by 16
	;and	$f0	; drop fractional bits

As a bonus it can keep the fractional bits.

Por Micha

Expert (110)

Imagen del Micha

21-09-2022, 16:28

theNestruo wrote:

I see your utterly impractical and bizarre way and raise the bet (i.e.: it's faster, but even more impractical!):

ld l, h
ld a,(hl)

The LUT would be now at $0000, $0101, $0202, ..., $fefe, $ffff (oops! you cannot divide $ff)

Great ! another 3 cycles shaved off....!

Por santiontanon

Paragon (1830)

Imagen del santiontanon

21-09-2022, 19:02

Haha, awesome solutions! And some are not that crazy haha. Those spread out LUTs might even be feasible in some demos with constrained values Smile

Por Grauw

Ascended (10819)

Imagen del Grauw

23-09-2022, 21:13

Haha, those are some great solutions indeed! Bore takes the win in my book Smile. Approaching the problem from a different angle like that can definitely lead to much faster algorithms, for sure I wouldn’t dare call it impractical without knowing the context of the precise intended application.

Por albs_br

Champion (499)

Imagen del albs_br

01-10-2022, 15:38

theNestruo wrote:

If you are using Z80 Assembly meter, there is a z80-asm-meter.platform setting. Set it to msx.

Thanks @theNestruo, it worked like a charm. No standard Z80 times anymore!

Página 2/2
1 |