This was a mix of superoptimization and coding by hand (superopt currently can't deal with the zero flag).
I remember the superoptimizador… that thing is cool! I imagine on today’s PCs, and with multithreading (?), it runs a lot quicker now? I liked that overview page where you used to have, with some example optimisation problems passed through it. Would be neat to see that resurrected and expanded!
That page still lives: Superopt
These days not only we have more threads, but also more memory, I could cache intermediate results. Guess I'll get back to that project, it may be more useful now.
Humm, I get a mess... did I correctly apply your patch ?
struct sat y db 0 x db 0 f db 0 c db 0 ends ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; ; plot enemies and bullets if visible in the current SAT in ram ; ; depends on xmap,ymap _plot_enemy: ld iy,(alt_ram_sat) ld ix,enemies ld bc,(max_enem + max_plyr_bullets + max_enem_bullets)*256+0 ld hl,-128 ld de,(ymap) and a sbc hl,de ld (tempy),hl ld hl,(xmap) ld de,-32 add hl,de ld (tempx),hl .npc_loop1: bit 0,(ix+enemy_data.status) jp z,.invisible ld l,(ix+enemy_data.y+0) ld h,(ix+enemy_data.y+1) ld de,(tempy) add hl,de ; hl = enemy.y - (ymap + 128) ld de,128+16 ; hl = enemy.y - (ymap + 128) + 128 + 16 >=0 add hl,de ; hl = enemy.y - ymap + 16 >=0 jr nc,.invisible ; !(-16 <= enemy.y - ymap < 128) ld a,l add a,64-16 ; a = enemy.y - ymap + 64 ld (iy+sat.y+0),a ld (iy+sat.y+4),a ; not needed if single layer but in this way it is overall faster ld l,(ix+enemy_data.x+0) ld h,(ix+enemy_data.x+1) ld de,(tempx) ; CF is reset by previous add sbc hl,de ; hl = enemy.x + 32 - xmap < 0 jp m,.invisible ; hl <0 <==> dx = enemy.x - xmap < -32 ld a, l ; 5 sub 32 ; 8 ld e, a ; 5 ld a, h ; 5 sbc a, 0 ; 8 jr c,.has_ec ; 13/8 jr nz,.invisible ; 13/8 ld l, e ; 5 .has_ec: and 128 ; 8 or (ix+enemy_data.color) ; 21 ld e, a ; 5 ld a,(ix+enemy_data.frame) ld (iy+sat.x),l ; write X ld (iy+sat.f),a ; write shape ld (iy+sat.c),e ; write colour ld (ix+enemy_data.plane),c ; save SAT plane inc c set 7,(ix+enemy_data.status) ; set it as visible cp 16*4 ; hard coded in the SPT jp nc,.two_layers .one_layer: ld e,sat add iy,de jp .next .invisible res 7,(ix+enemy_data.status) ; set it as invisible .next: ld de,enemy_data add ix,de djnz .npc_loop1 ld a,c ld (alt_visible_sprts),a ret .two_layers: ld (iy+sat.x+4),l ; second layer X add a,4 ld (iy+sat.f+4),a ; second layer shape ld a,e and 0xF0 inc a ; second layer is always black ld (iy+sat.c+4),a inc c ld e,2*sat add iy,de jp .next
I think the first jump before the patch should be jp c
instead of jp m
, also the code expects to have xmap-32
in tempx
, instead of 32-xmap
. (I was working from your code before bore's patch).
and speaking of the superoptimizador, the link you posted has a description and examples of it, but I do not see the actual optimizer. Is it available online anywhere? I have been considering creating something like that myself for a while!
but in RAM ops the Akku is the better than HL!
21 ld l,(ix+enemy_data.y+0) 21 ld h,(ix+enemy_data.y+1) 22 ld de,(tempy) 12 add hl,de ; hl = enemy.y - (ymap + 128) 11 ld de,128+16 ; hl = enemy.y - (ymap + 128) + 128 + 16 >=0 12 add hl,de ; hl = enemy.y - ymap + 16 >=0 8 jr nc,.invisible ; !(-16 <= enemy.y - ymap < 128) -- 107
14 ld a,(tempy+0) 21 add (ix+enemy_data.y+0) 5 ld e,a 14 ld a,(tempy+1) 21 adc (ix+enemy_data.y+1) 8 jr nz,.invisible ;high byte not 0 => outside 8bit window 5 ld a,e 8 cp 128+16 8 jr nc,.invisible -- 104
even more, HL is still free to use! that can save dozen cycles somewhere else.
the 16bit version looks like it is faster and was more easy to develop, but the opposite is the case.
and speaking of the superoptimizador, the link you posted has a description and examples of it, but I do not see the actual optimizer. Is it available online anywhere? I have been considering creating something like that myself for a while!
Sure, it's on my github.
Your work on optimization via brute force research is fascinating.
How do you describe the algorithm you want the program to encode?
The desired function is encoded using standard C:
unsigned char final2 (unsigned char a, unsigned char h) { signed short x = (signed short)(((unsigned short)h << 8) | a); return x > 256+32; }
I think the first jump before the patch should be jp c
instead of jp m
, also the code expects to have xmap-32
in tempx
, instead of 32-xmap
. (I was working from your code before bore's patch).
I've tested you patch with xmap-32 in tempx and with jp c
It seems to work when X in in 128-256, but not outside that interval
if I use jp m, it seems to work for x<256 but not for larger values