Duff's Device

July 17th, 2007
Duff's Device is an optimization technique for loop unrolling that allows faster execution of loops by minimizing the number of tests. Tom Duff was credited for this unique C implementation:

switch ( count % 8 )
{
case 0: do { *to++ = *from++;
case 7: *to++ = *from++;
case 6: *to++ = *from++;
case 5: *to++ = *from++;
case 4: *to++ = *from++;
case 3: *to++ = *from++;
case 2: *to++ = *from++;
case 1: *to++ = *from++;
} while (( count -= 8 ) > 0);
}


I couldn't believe my eyes when my friend Mark showed me this piece of code some years ago. I couldn't imagine that this thing would even compile. If you feel the same, just try it and see for yourself that this is perfectly valid C code.

What confused me was the combined use of the switch statement and the do-while loop and how those two interfere to each other. I asked myself, what a do-while loop really is? In fact, it's nothing more than one label, one if-statement and one goto statement. If you rewrite the code above in terms of labels and gotos you will end up with something like this:

switch ( count % 8 )
{
case 0:
start:
*to++ = *from++;
case 7:
*to++ = *from++;
case 6:
*to++ = *from++;
case 5:
*to++ = *from++;
case 4:
*to++ = *from++;
case 3:
*to++ = *from++;
case 2:
*to++ = *from++;
case 1:
*to++ = *from++;
if (( count -= 8 ) > 0) goto start;
}


This is equivalent to the original code, but it looks way too familiar. At this point you may wonder "Why interference between switch and loop statement is allowed in C?" The short answer is "Why not?" The long answer could be that the purpose of the C programming language was to make programmers' lives easier, but by no means to deprive the unlimited power and control that assembly so generously offered. If something could be implemented in assembly, it should also be implementable in C, only look prettier...

Because C# does not allow fall through, the closer you can get to Duff's Device is this ugly implementation:

if (count % 8 == 7) goto case_7;
if (count % 8 == 6) goto case_6;
if (count % 8 == 5) goto case_5;
if (count % 8 == 4) goto case_4;
if (count % 8 == 3) goto case_3;
if (count % 8 == 2) goto case_2;
if (count % 8 == 1) goto case_1;

case_0:
toBuffer[i++] = fromBuffer[i++];
case_7:
toBuffer[i++] = fromBuffer[i++];
case_6:
toBuffer[i++] = fromBuffer[i++];
case_5:
toBuffer[i++] = fromBuffer[i++];
case_4:
toBuffer[i++] = fromBuffer[i++];
case_3:
toBuffer[i++] = fromBuffer[i++];
case_2:
toBuffer[i++] = fromBuffer[i++];
case_1:
toBuffer[i++] = fromBuffer[i++];
if (( count -= 8 ) > 0) goto case_0;


In Java things are even worse. Java allows fall through, but will not allow interference between the switch statement and the do-while loop. Also Java's lack of a goto statement makes impossible to follow the other two approaches.

3 Responses to “Duff's Device”

  1. Matthew Whited Says:
    how about this?
    static void DuffsDevice(T[] from, T[] to)
    {
    int count = from.Length;
    int position = 0;

    switch (count % 8)
    {
    case 0:
    to[position] = from[position++];
    goto case 7;
    case 7:
    to[position] = from[position++];
    goto case 6;
    case 6:
    to[position] = from[position++];
    goto case 5;
    case 5:
    to[position] = from[position++];
    goto case 4;
    case 4:
    to[position] = from[position++];
    goto case 3;
    case 3:
    to[position] = from[position++];
    goto case 2;
    case 2:
    to[position] = from[position++];
    goto case 1;
    case 1:
    to[position] = from[position++];
    if ((count -= 8) > 0) goto case 0; else break;
    }
  2. Charles Young Says:
    Or this, which is similar, but closer to the original...

    namespace DuffsDevice
    {
    using System;

    class Program
    {
    static void Main(string[] args)
    {
    unsafe
    {
    short[] data = new short[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
    short register = -1;

    fixed (short* dataPtr = &data[0])
    {
    DuffsDevice(&register, dataPtr, data.Length);
    }

    Console.WriteLine(register);
    }
    }

    ///
    /// C# approximation to Duff's device.
    ///
    ///
    ///
    unsafe static void DuffsDevice(short* to, short* from, int count)
    {
    int n = (count + 7) / 8;

    switch (count % 8)
    {
    case 0:
    *to = *from++;
    goto case 7;
    case 7:
    *to = *from++;
    goto case 6;
    case 6:
    *to = *from++;
    goto case 5;
    case 5:
    *to = *from++;
    goto case 4;
    case 4:
    *to = *from++;
    goto case 3;
    case 3:
    *to = *from++;
    goto case 2;
    case 2:
    *to = *from++;
    goto case 1;
    case 1:
    *to = *from++;
    if (--n > 0) goto case 0; else break;
    }
    }
    }
    }
  3. Benoit Alain Says:
    It think you guys are missing the point of the device. There should be no useless branching throughout the loop, just one from the 'switch' before the loop starts and then one from the 'while' each time the loop has to continue its course.

    Your C# code leads to a much slower loop that branches 8 times every go. Unless the compiler's optimizer sees it and washes it away, the additional branching will consume more resources than the simple copying operations that you initially wanted to do.

    While you can't use Duff's Device in all languages, the following code is usually accessible, not dramatically slower, nor much uglier:

    int i = 0;

    // first iteration only, much slower than Duff's Device
    int first_iteration_count = (count + 7) % 8 + 1; // 0→8, 1→1, etc.
    do {
    toBuffer[i] = fromBuffer[i]; i++; // careful with i++ here!
    } while(i 0);

    The code I suggest has the big problem of repeating the same statement in two separate loops. When modifying the second loop, it is easy to forget to correct the first one accordingly. Hopefully, not much loops will have to be unwrapped by hand.

Leave a Reply