Schrijver
| Code optimization
| ARTRAG msx master Berichten: 1592 | Geplaatst: 07 Februari 2008, 12:06   | It is a long time since someone mentioned ASM coding on MRC.
Time to repair this issue.
Can this code be optimized ?
The two functions are supposed to copy from and to a "room" of size map_w X map_h (height do not care)
the background under a frame taken from external data and store the tiles in a buffer.
The position in the room and the address of the buffer that stores the background are passed
as parameters in registers BC and DE.
The frame number is passed on the stack.
;de source_addr;
;bc dest_addr;
;ix+4 e ix+5 nframe
global _npctgrab
_npctgrab:
push ix
ld ix,0
add ix,sp
push de
ld e,(ix+4)
ld d,(ix+5)
ld hl,_frames
add hl,de
add hl,de
ld e,(hl)
inc hl
ld d,(hl) ; de punta alla frame corrente
push de
pop ix ; ora ix punta alla frame corrente
pop hl ; hl punta alla source in room
ld d,b ; bc puntava alla destination in frame buffer
ld e,c ; ora de punta alla destination in frame buffer
1: ld a,(ix+0) ; 127 == fine
cp 127
jp z,3f
ld c,a
ld b,0
push hl
add hl,bc ; source
ld c,(ix+1) ; len
inc ix
inc ix
add ix,bc
ldir
pop hl
ld bc,(_map_w)
add hl,bc
jp 1b
3: pop ix
pop hl
pop af
jp (hl)
;de source_addr;
;bc dest_addr;
;ix+4 e ix+5 nframe
global _npctrest
_npctrest:
push ix
ld ix,0
add ix,sp
push de
ld e,(ix+4)
ld d,(ix+5)
ld hl,_frames
add hl,de
add hl,de
ld e,(hl)
inc hl
ld d,(hl) ; de punta alla frame corrente
push de
pop ix ; ora ix punta alla frame corrente
pop hl ; hl punta alla source in room
ld d,b ; bc puntava alla destination in frame buffer
ld e,c ; ora de punta alla destination in frame buffer
1: ld a,(ix+0) ; 127 == fine
cp 127
jp z,3f
ld c,a
ld b,0
push de
ex de,hl
add hl,bc ; source
ex de,hl
ld c,(ix+1) ; len
inc ix
inc ix
add ix,bc
ldir
pop de
ld bc,(_map_w)
ex de,hl
add hl,bc
ex de,hl
jp 1b
3: pop ix
pop hl
pop af
jp (hl)
The frame data are structured like this
framex1:
db 0,2,18,19 ; X offset of line 0, length of line 0, data, data ect
db 0,2,20,21; X offset of line 1, length of line 1, data, data ect
db 127 ; end of the frame
frame0:
db 5,1,147
db 4,1,147
db 3,1,147
db 2,1,147
db 1,1,147
db 127
etc
_frames:
dw framex1,frame0,frame1,frame2,frame3,frame4,frame5, etc etc
| | ARTRAG msx master Berichten: 1592 | Geplaatst: 07 Februari 2008, 13:35   | (I mean optimized for speed naturally)
| | ro msx guru Berichten: 2320 | Geplaatst: 07 Februari 2008, 15:48   | Well, using the index regs (IX and IY) are never clever tricks concering speed. They're slow. Using HL regs and doing some incs and decs will speed it up already. Make intellegent tabels so you don't have to inc/dec too many times.
Comparing the Accu with 127 using the CP method, like you do, can be done faster by using AND #7F, JP NZ,xxxx
just some thoughts...
| | Metalion msx freak Berichten: 215 | Geplaatst: 07 Februari 2008, 15:49   | try not to use the IX register, it will increase speed  
EDIT : posted at the same time as the previous message ... | | Huey msx professional Berichten: 582 | Geplaatst: 07 Februari 2008, 15:57   |
AFAIK the ASM code is called using Hitech-C. It puts parameters in IX register....
| | ARTRAG msx master Berichten: 1592 | Geplaatst: 07 Februari 2008, 16:05   | @Huey
Not really, Hitech-C puts parameters on the stack before calling the function
and asks the called function to not modify the value of IX.
@ro and Metalion
I'd like to avoid IX and IY in the loop, but I do not know how, this is why I ask support
| | MicroTech msx lover Berichten: 109 | Geplaatst: 07 Februari 2008, 16:28   | Quote:
| I'd like to avoid IX and IY in the loop, but I do not know how, this is why I ask support
|
Do you have an equivalent C source?
Maybe it can be re-compiled with ASCII-C (which does not use index registers) and we can take inspiration from the resulting asm code.
| | ARTRAG msx master Berichten: 1592 | Geplaatst: 07 Februari 2008, 16:42   | @MicroTech
No, this code is hand made and designed to be called by the Hitech-C compiler.
This affects only the way in which input parameters are passed and implies the need of restoring IX on exit
| | ARTRAG msx master Berichten: 1592 | Geplaatst: 07 Februari 2008, 17:00   | Needless to say, I've tried to avoid the use of IX, but I do not see any real solution
| | jltursan msx professional Berichten: 847 | Geplaatst: 07 Februari 2008, 17:10   | Optimization based on avoiding the use of index registers is a good idea; but there's no much iteration over these instructions. The biggest time is wasted in LDIR; so I think the best idea is to unroll the LDIR and repeat (height) times, (width) LDIs...at a size cost, of course
I've just remembered of a "Fast LDIR routine" posted somewhere in the forum, it was based on LDI of course; but with variable length, not custom as is now the case.... | | ARTRAG msx master Berichten: 1592 | Geplaatst: 07 Februari 2008, 17:13   | Sadly to say, LDIR most part of the times moves 1 ore 2 bytes at time...
but this depends on the shape of the frame, so I cannot unroll it as I do not know the length of line X in advance
| | jltursan msx professional Berichten: 847 | Geplaatst: 07 Februari 2008, 17:16   | Damn...
If it's no more than 2 bytes (I suposse not), maybe you can use alternate methods like a simple LD ($NNNN),HL. If it's definitely variable between 1-n (being possible n>2) then you must stuck with LDIs, I can't see any other way to transfer data, faster I mean.
| | Metalbrain msx friend Berichten: 15 | Geplaatst: 07 Februari 2008, 17:20   |
push de
pop ix ; ora ix punta alla frame corrente
If you don't mind using the undocumented instructions, I think this is faster:
ld ixh,d
ld ixl,e
| | ARTRAG msx master Berichten: 1592 | Geplaatst: 07 Februari 2008, 17:21   | All depends on the shape of the current frame, and in the game
it can vary a lot, from a simple NxM square (eg. a door) to a vine hanging from a tree (lots of lines with only one byte and different offsets).
| | ARTRAG msx master Berichten: 1592 | Geplaatst: 07 Februari 2008, 17:28   | Quote:
|
push de
pop ix ; ora ix punta alla frame corrente
If you don't mind using the undocumented instructions, I think this is faster:
ld ixh,d
ld ixl,e
|
push de ; 10 T states
pop IX ; 14 T states
total 24
ld ixh,d ; 8 T states
ld ixl,e; 8 T states
total 16
yes this is a saving but not what I was hoping (it is outside the inner loop, so almost negligible...)
anyway i'll change
push de
pop ix ; ora ix punta alla frame corrente
to
db 0xdd
ld h,d
db 0xdd
ld l,e
| |
| |
| |