Grauw’s blog

GPGPU applications on MSX

September 28th, 2008

So ‘GPGPU’ (General Purpose processing on GPU) programs are all the rage nowadays. I was wondering, has anyone got good ideas of (calculation) tasks you could use the MSX v9938/58 VDP’s command engine for? Theoretically it is much faster than the Z80 :).

Some things the v9938 can do for you:

  • logical operations with other data (from VRAM or CPU): and, or, xor, not, timp, tand, tor, txor, tnot
  • 2 bit-precision rotates (in screen 6), i.e. multiplication and division by 4
  • 2 bit shifts (using and for bit masks)
  • 2 or 4 bit to 8 bit conversion and vice versa (using LMMC/LMCM)

Maybe there is more you can do by e.g. using screen 12, or by placing the data in memory in clever structures, or by using the ‘transparent’ versions of the operators, or by storing data in screen 5 and reading it in screen 6…? Maarten (ter Huurne) mentioned that you can color the area between two white dots on a black background by doing an XOR copy one pixel to the left, those kinds of techniques are interesting applications.

I was thinking about my Tiger Tree Hash implementation, which does a lot of 64-bit operations which are relatively slow and the VDP could easily apply them to 8 bytes in one go, with carry-over. But I would also need additions/subtracts and odd-bits shifts, and TTH is not very easily parallelisable (iirc), so I would need to move a lot of data back and forth between the CPU, which very likely causes too much overhead for GPGPU to be useful there.

So, any thoughts? Other types of calculations the VDP (or another MSX chip) can do, perhaps with clever combinations of these? And any applications for this? Certain kinds of hashes (CRC?), bitmap conversion, certain 3D calculations, folding@home, mp3 decoding :)?

Grauw

Comments

Question by AR at 2008-09-29 22:01

Can you show in detail some algorithm or operation realized using the VDP ?

Re: Question by Grauw at 2008-12-21 03:08

Well, that’s basically what I’m asking the reader :). It’s powerful, but of course also limited in what it can do.

But for example, the AES encryption algorithm consists of four basic steps that are applied several times:

  • SubBytes (table lookup)
  • ShiftRows (byte shift)
  • MixColumns (byte multiplication)
  • AddRoundKey (xor)

Of these four steps, the ShiftRows and AddRoundKey steps can be done by the VDP on its own. And the other two can probably be done by manipulating the address register during a copy command (HMMC, LMMC, LMCM), but that does of course require involve the CPU

Only thing I do not know is whether AES codes bytes sequentially or can be done in parallel. In other words, whether you can decode several ‘blocks’ in one go (which would be very fast), or whether each block depends on the block before it (which would need to involve the CPU for every step).