Ah, indeed, I disallowed numeric constants to prune the space. I can try allowing 8 bit constants, but that'll take forever haha.
And indeed, I'll try it in the unpacker! If it's already been optimized, I doubt anything would be found, but let me give it a try!
Ah, yes, allowing constants: 0, 1, 2, 3, 4, 7, 8, 15, 16, 31, 32, 63, 64, 127, 128, 240, 255
I get this in about 10 seconds, which is pretty much albs_br's second solution (other variations are found, but they are all equivalent):
ld a, h rlca rlca rlca rlca and 15
Allowing for all constants between 0 - 255, I left it running for 5 minutes and it had not yet finished so, I stopped it hehe
are RRD or RLD useful here?
Ops, already thought by TheNestruo & GDX.
The fastest way I can think of would be utterly impractical and bizarre, but takes only 16 cycles (machine + T1) :
ld l,0
ld a,(hl)
Disadvantages:
- it needs an absurd "lookup table" scattered through the complete 64k of memory; the values of the table will be 256 bytes apart from eachother at &0000, &0100, &0200, .... &FF00
- register l is not preserved
I would personally go for the lookup table as proposed in the first post of this topic...
The fastest way I can think of would be utterly impractical and bizarre, but takes only 16 cycles (machine + T1) :
ld l,0
ld a, (hl)
Disadvantages:
- it needs an absurd "lookup table" scattered through the complete 64k of memory; the values of the table will be 256 bytes apart from eachother at &0000, &0100, &0200, .... &FF00
- register l is not preserved
I would personally go for the lookup table as proposed in the first post of this topic...
I see your utterly impractical and bizarre way and raise the bet (i.e.: it's faster, but even more impractical!):
ld l, h ld a,(hl)
The LUT would be now at $0000, $0101, $0202, ..., $fefe, $ffff (oops! you cannot divide $ff)
While we are looking at impractical solutions, how about changing the way the value is represented?
ld a, h ; convert to 4.4 fixpoint and divide by 16 ;and $f0 ; drop fractional bits
As a bonus it can keep the fractional bits.
I see your utterly impractical and bizarre way and raise the bet (i.e.: it's faster, but even more impractical!):
ld l, h ld a,(hl)
The LUT would be now at $0000, $0101, $0202, ..., $fefe, $ffff (oops! you cannot divide $ff)
Great ! another 3 cycles shaved off....!
Haha, awesome solutions! And some are not that crazy haha. Those spread out LUTs might even be feasible in some demos with constrained values
Haha, those are some great solutions indeed! Bore takes the win in my book . Approaching the problem from a different angle like that can definitely lead to much faster algorithms, for sure I wouldn’t dare call it impractical without knowing the context of the precise intended application.
If you are using Z80 Assembly meter, there is a z80-asm-meter.platform
setting. Set it to msx
.
Thanks @theNestruo, it worked like a charm. No standard Z80 times anymore!