What’s The Best Speed To Avoid Deadlock?

Who Wins When You Keep A Buggy Design Over Fixing It?

The Problem

Multithreaded software development is very hard to do right under the best of circumstances, but there are some pitfalls that just scream “deadlock” without needing a debugging trace.

On a backup software application I worked on some years ago, I was improving its performance, both perceived and real, and after a number of changes that improved it significantly, I found it was possible to consistently put the app into a state of deadlock within two or three minutes.  There was no way to get around it—you had to kill the app—or reboot the computer, as most users would after seeing the Mac’s Spinning Pizza Of Death—to go any further, which was pretty bad for users who expected their files to be automatically backed up.

After a period of investigation, I found there were two threads whose code had existed since the source code’s first check-in (well before my changes), doing locks on the same two objects but in an alternate order from each other.  In other words, this was a classical case of obvious deadlock.  The only reason it never deadlocked before was because the software had been so slow before my changes that it was just pure luck.

An example of what I saw follows:

-(void)methodRanOnOneThread
{
    // Let's lock the microwaveObject so other threads won't
    // mess with it till we're done
    @synchronized (microwaveObject)
    {
        // Let's lock the refrigeratorObject so other threads
        // won't mess with it till we're done
        @synchronized (refrigeratorObject)
        {
            // Do some stuff with microwaveObject and
            // refrigeratorObject
        }
    }
}

-(void)methodRanOnAnotherThread
{
    // Let's lock the refrigeratorObject so other threads won't
    // mess with it till we're done
    @synchronized (refrigeratorObject)
    {
        // Let's lock the microwaveObject so other threads
        // won't mess with it till we're done
        @synchronized (microwaveObject)
        {
            // Do some stuff with refrigeratorObject and
            // microwaveObject
        }
    }
}

Obviously, I reversed the order of object-locking in one of the methods (the @synchronized operation) so both methods had the same order and checked it in.

Unfortunately, the original code had been written by a young developer who had extreme ownership issues with anything he wrote.  Since he couldn’t believe he’d made a mistake, this little change turned into a week of meetings with our manager who eventually let the developer try to prove over the next week that he could solve the problem by introducing some speed changes to one of the threads.

There are no pronounceable, respectful words I could use for that manager or my colleague.

After a week, the developer admitted defeat, and I began responding to calls from the recruiters who regularly contacted me.

Before I eventually moved on, that manager made two other executive decisions that convinced me to leave as soon as possible.

The first was that he was personally convinced beyond any discussion it would take twice as long to develop a solution to support two mounted cloud-storage hard drives (aka NAS drives) than supporting a single one (three times as long for three drives, four times for four drives, etc.), so he decided that to cut development time, our customers should not be able to mount more than one drive at a time.  I asked him if he’d ever heard of a for-loop—he looked at me like I’d grown a second head.  He had told me shortly after he took over the team (he wasn’t my team’s original manager) that he used to do software development.  His understanding of basic software development principles explains why he moved to management.

The manager’s second brilliant decision was to not support backing up the full data of Mac OS X files because the Windows team architect said the corresponding Windows product doesn’t support multi-fork/named-stream files (aka extended attributes on HFS+ volumes), so there’s no need to do the same for HFS+ data.

In addition to explaining how unethical it would be to produce a backup product to the cloud that didn’t preserve all of a customer’s data, I explained there was absolutely no technical reason preventing us from doing so, since our cloud-based drives supported WebDAV, and the Linux-based OS running on our cloud storage supported AppleDouble, which was its method for preserving extended attributes.

At least the manager didn’t listen to the Windows architect when he suggested we use Microsoft Silverlight to do cross-platform software development.

When I finally gave my notice, I was asked to stay longer, but I could not fathom remaining any more than a week to offer the needed knowledge transfer.

The Solution

Software development requires many skills, but one of the most important ones that cuts across practically all fields is a level of humility, to be willing to accept when your solution is either not right or even the best.  When you find yourself in such an environment with colleagues or management who consistently make poor choices that affect your ability to succeed, and they refuse to listen to reason—beware when “consensus” becomes a hammer—it’s best to move on or risk damaging your career by association.

Writing a sequel to Office Space might work, too!

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s