mikeage.net Logo
mikeage.net/2006/10/26/duffs-device/

mikeage.net @ ז׳ שבט תשע״ח

Duff's Device

No C programmer can call himself (or herself) a real programmer without having seen (and been amazed by) Duff's Device. It was invented as a more efficient alternative to regular loop unrolling used in copying bytes from memory to a register (each byte went to the same address in memory), by handling the case when there are "leftover" bytes after a full unroll.

The original code looked something like:

do {
  *to = *from++;
} while (--count > 0);

In Duff's application (real time animation), this was too slow. The standard solution would be to unroll the loop, to minimize the number of jumps:

register n = (count / 8)
do {
  *to = *from++;
  *to = *from++;
  *to = *from++;
  *to = *from++;
  *to = *from++;
  *to = *from++;
  *to = *from++;
  *to = *from++;
} while (--n > 0);

but then one is left with somewhere between [0,8) bytes remaining, which have to be copied manually. By playing with the number of unrolled statements, one can balance minimum loops vs loops at the end, but there is still the need for some special handling.

Duff's device interleaves a switch / case statement in with the do / while loop. This, coupled with C's default fall through behavior (cases "fall through" into the next case if not aborted with a break statement), allows the following code to work:

register n = (count + 7) / 8;
  switch (count % 8) {
    case 0: do { *to = *from++;
    case 7:      *to = *from++;
    case 6:      *to = *from++;
    case 5:      *to = *from++;
    case 4:      *to = *from++;
    case 3:      *to = *from++;
    case 2:      *to = *from++;
    case 1:      *to = *from++;
            } while (--n > 0);
    }

This code starts by handling the special case (the remainder), and then performs as many batches of 8 as are necessary. Note that the switch is only used to trigger count % 8 copies; after that, the while loop takes over (the beauty is that it knows to jump back to the do { statement even though it may not have started there!), and copies 8 bytes at a time.

Note that this is NOT designed to replace memcpy, but rather to provide an efficient way to copy many bytes into one.

Leave a Reply

Quick Map
Content +
Personal +
Archives +
Site Stuff +
RBS Weather +
Search +
Recent Images

Valid XHTML 1.1!
Screen Pge
 

Last Modified: September 04, 2006 @ 02:11 CST

Memory(TRUE): 4194304/4194304
Memory(FALSE): 2986904/2996888