Schrijver
| Fastest possible multiplication routine?
|
dvik msx master Berichten: 1312 | Geplaatst: 05 Mei 2007, 11:15   |
I'll send you the version I did earlier. Its not complete but I think I can make it complete based on your version. I started doing a similar routine but didn't finish all cases. I tested numbers 0<L<48 and the routine I made is about 13% faster. Not sure how much bigger the code would be but probably 4-6 times as big or something (I don't do fallthroughs).
But if you want really fast and have a lot of memory you can do a single table lookup. I was actually considering that for the app I was writing.
Would be interesting to know what multiplication routine (if any) alone coder used in his speccy port of wolfenstein.
|
|
Prodatron msx master Berichten: 1109 | Geplaatst: 05 Mei 2007, 11:39   |
Ok, I am looking forward to it 
Regarding the lookup table: Maybe I missunderstood the way, you want to do it, but wouldn't it have a size of 256*65536 bytes?
I use only lookup-tables for the division-part: X will be converted into 1/X, so I can realize formulars like A=X*Y/Z without divisons but two multiplications.
Btw, the bit-shifting instructions may add some optimization. The problem with them is, that you will loose the possibility to have 16bit results. Ok, if you don't need it, it's better to use them... Not easy to have ONE best routine  |
|
AuroraMSX
 msx master Berichten: 1250 | Geplaatst: 05 Mei 2007, 12:23   |
|
|
NYYRIKKI msx master Berichten: 1511 | Geplaatst: 05 Mei 2007, 12:37   |
|
|
NYYRIKKI msx master Berichten: 1511 | Geplaatst: 05 Mei 2007, 12:53   |
|
|
GhostwriterP msx addict Berichten: 313 | Geplaatst: 05 Mei 2007, 12:55   |
Not to mention that those add hl,hl's and such are also much faster on r800.
|
|
Prodatron msx master Berichten: 1109 | Geplaatst: 05 Mei 2007, 14:13   |
|
|
dvik msx master Berichten: 1312 | Geplaatst: 06 Mei 2007, 00:34   |
I think the multiplication algorithm to choose depends on the program using it. The table version NYYRIKKI shown is very nice for small numbers -64 to +63, but for bigger numbers it slows down a bit because the lookup math exceeds 8 bits and of course the table grows. But for many cases especially on an MSX with a low resolution, the range is enough. Prodatrons algorithm has a much bigger range one 8 bit value -128 to 127 multiplied with a 16 bit value and its only about 50% slower than the table version.
Btw, one of my IOCCC entries uses the same hyperbolic functions as the table based multiplication algorithm, but in this case to generate prime numbers. |
|
|
|
|