Lighting
From AlephWiki
Lighting Textures
Return to History
Unknown author
You can light a 16bpp pixel by a 5-bit lighting value in about 5 cycles (about the time a LUT lookup takes) if I remember right...
in PPC assembler:
// r3 is the pixel, high 17 bits MUST be off // r4 is the light, 0-32 inclusive
rlwimi r3, r3, 15, 32-25, 31-30 rlwinm r3, r3, 0, 32-5, 31-10 mulw r3, r3, r4; rlwimi r3, r3, 32-15, 32-10, 31-5 rlwinm r3, r3, 32-5, 32-15, 31-0
in C for Metrowerks PPC compilers:
// light is 0-32 inclusive
// there is no C that will ever generate rlwimi from MW's compilers pixel = __rlwimi(pixel, pixel, 15, 32-25, 31-30); pixel &= 0xFFFFFC1F; // if this does not compile to a rlwinm use: // pixel = __rlwinm(pixel, 0, 32-5, 31-10); pixel = pixel*light; pixel = __rlwimi(pixel, pixel, 32-15, 32-10, 31-5); pixel= (pixel>>5) & 0x00007FFF; // if this doesn't make a rlwinm use: // pixel = __rlwinm(pixel, 32-5, 32-15, 31-0); optionally clear high 17 bits of result: pixel
in English:
load the pixel value into a unsinged long pixel is 00000000000000000RRRRRGGGGGBBBBB
load the light value into a unsigned long (0-32 inclusive) light is 00000000000000000000000000LLLLLL
copy the green channel to the left 15 bits pixel is 0000000GGGGG00000RRRRRGGGGGBBBBB
zero out the green channel's old position pixel is 0000000GGGGG00000RRRRR00000BBBBB
multiply pixel by light, put result in pixel pixel is 00GGGGGgggggRRRRRrrrrrBBBBBbbbbb
copy high five bits of green field on top of low 5 bits of red field pixel is 00GGGGGgggggRRRRRGGGGGBBBBBbbbbb
shift pixel to the right 5 bits and mask out the high 17 bits pixel is 00000000000000000RRRRRGGGGGBBBBB
notes: this works because of the the commutative nature of multiplies and shifts ((x<<y)*z == (x*z)<<y), the fact the multiplying by a power of two is equivalent to a shift (x*32 == x<<5), and the distributive nature of multiplication and addition (x*z+y*z == (x+y)*z), which combine to allow us to do the multiplication in a single instruction. In addition to that, because of two very cool instructions the PPC has (rlwimi and rlwinm, both taking 1 cycle) we can do the setup and cleanup for this multiply in two instruction each. I'm not absolutely sure about this, but I believe that the multiply itself will take 2 cycles on most of the PPC family of processors. If you get two of these going at once properly interleaved that results in 5 cycles per pixel (not counting load), which is quite reasonable.
The reason for referring to the constants as 32-x and 31-x is that those numbers are counted from the msb end of the word, and I prefer counting from the lsb end, so a little subtraction that the compiler optimizes away makes it a lot easier to read.
32-bit lighting can be done with similar methods using two multiplies instead of one (since the 48-bit result field can't fit in a word), and thus two more cycles and one more register. Colored lighting however takes three multiplies and a lot more setup no matter what, and is best done at blit time instead of render time.
P.S. This is part of the current version of the first in a series of pixel-processing and moving related articles I'm going to be writing for a mac-centric development website that is currently in development (ironic, eh?) so commentary is quite welcome.
P.P.S. I'm not 100% sure I got all those mask bit limits right.
