When I was a child, my parents would often tell me to repeat what they just told me, since I usually wasn’t paying attention. Now I have to do the same thing with my own daughter. Payback time, it seems.
But this blog entry isn’t about parenting, it’s about error messages.
I was just writing some code and realized that an important rule when writing error messages is to repeat back what the user said. There are many violations of this rule, the first one that comes to mind is this one from Windows:
The system cannot find the path specified.
That error may be comprehensible if you just typed a command, but as part of a script, it will be entirely useless. Obviously, the pathname needs to be displayed (of course, we still don’t know what was being done, or why).
This becomes even more important when a user specified value is modified in some way. For example I had a command line argument which could take a list. After breaking the list apart, I needed to validate the entries in the list. If I found anything invalid I could have simply given the error “invalid parameter”. Useless! Rather, I filtered out the valid values and then printed out the offending ones: “invalid parameters: a,b,c”.
Now, repeat what I just said!
The more I work with Perforce the more I dislike it. I just wasted over an hour of my life doing what should be a trivial action: adding a user.
At this point a parenthetical rant is needed: I don’t think the administrator of an SCM system should have to do such things. User management should be an IT issue, and the project owner should be in charge of who can access their repositories. The SCM administrator should just be in charge of making sure the system is set up such that that is the case.
Since this Perforce server it at its licensed user limit, I have to first delete a user to make room. That should be a trivial operation, right?
$ p4 user -f -d jdoe
User jdoe has file(s) open on 1 client(s) and can't be deleted.
Huh? I don’t care about open files! Clearly the word “force” (the -f option) is being used in some strange way. Since there isn’t a “really force this damn deletion” option, I have find the open file. First look for the users “client”:
$ p4 clients -u jdoe
$
There are none? Obviously we have “client” sharing going on (I’ll leave that for another rant). Logically, I should be able to get a list of files this person has opened, but expecting logical behavior is, apparently, unrealistic:
$ p4 opened -u jdoe
Usage: opened [ -a -c changelist# -C client -m max ] [ files... ]
Invalid option: -u.
That’s fine, I can use grep, even though it could be imprecise. For example, imagine that we had a person named James Ava, greping for “java” could yield countless false positives. Nevertheless, forging ahead:
$ p4 opened -a | grep jdoe
//depot/projects/releases/Something/3.14/src/ugh.c#5 - edit default change (xtext) by jdoe@goose
Sure enough, the client “goose” is owned by a different person who is active, so I can’t just delete it. Fortunately, I found another technote saying how to do this, so I do what it says:
$ p4 login jdoe
User jdoe logged in.
$ p4 -u jdoe -c goose -H goose.example.com revert -k //depot/projects/releases/Something/3.14/src/ugh.c
You don't have permission for this operation.
What?! I am the administrator. Super user. I have permission to do anything! So here we get to my usual pet peeve: lousy error messages. Even if we took the error message at face value, it is unhelpful since it doesn’t say what permission I need (besides “super”, that is). But the error message is undoubtedly incorrect, it is more likely that the server is refusing for some unrelated reason, but, due to poor programming, that generic error message is displayed.
Of course, even if that latter command worked it begs the question, why do I have to do all this menial work? This should all be rolled into a single command. It could be rolled into a script if the “revert” command, above, worked correctly.
I want those two hours of my life back. I could have used them more profitably working on the Perforce to SVN converter, and using it to get people off Perforce.
Another nice one from Thunderbird:
The hex number is a nice touch: provide the illusion of being specific and helpful while not actually doing so. The suggestion to contact a system administrator is a good one, as misery loves company.
The new installer for ClearCase is a mess in many ways, but this error made me cringe:
System kernel was failed to build properly and as a result MVFS was not loaded.
The bad grammar and missing punctuation are just a little extra insult to the injury of not being told HOW the kernel build failed, or, perhaps better yet, what to do about it.
Here’s one from Chrome:
Cute icon! Funny phrase! I guess those are supposed to distract us from the total uselessness of the error message.
Back in 1992 or so I was writing an email-based trouble-ticket system which tried to match up incoming emails to existing trouble-tickets by looking at the In-Reply-To: email header. Much to my chagrin, I found that a few email programs did not add this header when replying to messages. So I had to add a set of kludges to hook together tasks that were mistakenly broken by such email messages, and some subject-line shenanigans to allow tasks to be manually specified.
Well, in those days, email was a new thing, and so some amount of ignorance was understandable. 20 years later, we have managed to add those couple of lines of code into every email program, right? Such hope is misplaced. While wrestling to get threading to work properly in Thunderbird, I find that it is still a problem! Viz, “The bad news is that not all e-mail clients actually generate these message headers.” Now we aren’t talking about some ancient text-based email programs (ironically, they all got it right back in 1992, it was the Mac which was broken), the example given in the next sentence is Yahoo!
While, I know, first hand, about dealing with these sort of broken email threads, it is sad that Thunderbird cannot get it right; none of the semi-hidden settings allow it to join together the multitude of broken email threads in my inbox.
The key to working with computers, it seems, is lowering your expectations.
I got this error, which definitely wins the quantity over quality prize:
23:56:50,545 [main] INFO historyLogger:84 - EXCEPTION CAUGHT: org.polarion.svnimporter.ccprovider.CCException: java.io.IOException: No space left on device
at org.polarion.svnimporter.ccprovider.internal.CCContentRetriever.getContent(CCContentRetriever.java:94)
at org.polarion.svnimporter.svnprovider.internal.actions.SvnAddFile.calculateLengthAndChecksum(SvnAddFile.java:104)
at org.polarion.svnimporter.svnprovider.internal.actions.SvnAddFile.dump(SvnAddFile.java:83)
at org.polarion.svnimporter.svnprovider.internal.SvnRevision.dump(SvnRevision.java:127)
at org.polarion.svnimporter.svnprovider.SvnDump.dump(SvnDump.java:191)
at org.polarion.svnimporter.main.Main.saveDump(Main.java:221)
at org.polarion.svnimporter.main.Main.run(Main.java:91)
at org.polarion.svnimporter.main.Main.main(Main.java:49)
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:260)
at org.polarion.svnimporter.common.Util.copy(Util.java:303)
at org.polarion.svnimporter.common.FileCache.put(FileCache.java:72)
at org.polarion.svnimporter.common.FileCache.put(FileCache.java:87)
at org.polarion.svnimporter.ccprovider.internal.CCContentRetriever.getContent(CCContentRetriever.java:90)
at org.polarion.svnimporter.svnprovider.internal.actions.SvnAddFile.calculateLengthAndChecksum(SvnAddFile.java:104)
at org.polarion.svnimporter.svnprovider.internal.actions.SvnAddFile.dump(SvnAddFile.java:83)
at org.polarion.svnimporter.svnprovider.internal.SvnRevision.dump(SvnRevision.java:127)
at org.polarion.svnimporter.svnprovider.SvnDump.dump(SvnDump.java:191)
at org.polarion.svnimporter.main.Main.saveDump(Main.java:221)
at org.polarion.svnimporter.main.Main.run(Main.java:91)
at org.polarion.svnimporter.main.Main.main(Main.java:49)
So, after that deluge of “information”, all I know that I ran out of space on a filesystem. Which filesystem, you ask? If they told us that would ruin the fun of this guessing game!
The stack trace is a nice touch, since it provides little useful information, like what parameters were being passed, etc. I used to display stack traces like this for my own programs, but have stopped doing so as they didn’t provide as much information as a well-written error message. This stack trace is much like driving directions which consist solely of the phrases “turn right” and “turn left”, but no street names, distances or landmarks. Largely useless.
Thunderbird popped this one up one day:
I cannot remember the context, but then I shouldn’t have to! So, basically some unspecified operation failed, for an unknown reason. It’s nice they mentioned the network connection didn’t get cleaned up, though it’s rather useless information since I don’t know what should be done about it, let alone what the impact is. As a programmer I can guess that the dangling network connection is just a minor adminstrative detail which will get cleaned up on the next reboot. But I can imagine my mother getting this error message and being worried that viruses or spammers are going to sneak onto her computer this way.
There is a fine line between being user friendly and treating people like morons. It is apparent that some programmers think that users cannot be presented with meaningful details of error situations as it will scare or intimidate them. This crosses the line and is simply insulting. Case in point (from Google Chrome):
Wow, that’s terribly uninformative. In this case, I am trying to debug a mod_rewrite configuration (a Sisyphean task, to be sure) and I did figure out how to dig in and see the real error message, which distills down to this:
<span style="font-family: Georgia, 'Bitstream Charter', serif; color: #444444;"><span style="line-height: 22px;">404 Not Found -- The requested URL /cgi-bin/w.pl was not found on this server.</span></span>
Let’s think about this from a different context, if I was getting a bug reported against my web site, which error message would I prefer to be provided? The former would be utterly useless, and I would have to go back to the person and have them do a “view source” so I could see the real error message. It would have been trivial to include the error message from the server verbatim and there’s no valid reason to exclude it, except, perhaps, to keep from scaring them :)
Some years ago I ran into a piece of code which shocked me, and in the time since then I have realized that it exemplified a lot of what is wrong with software. Sadly, I have since lost the code, so here is an approximation:
unless (open(F, "/some/important/file"))
{
# We don't want to scare the users with an error message
# warn "Unable to read config file";
}
Am I the only one who is outraged by this? What is scarier to a user, to get an error message when a genuine error situation occurred or let the software plod on getting even stranger and more non-sensical errors which cascade from this initial problem? For example, imagine the following code further on:
my $req = $http->request($config->{url});
die "Unable to contact web server $config->{url}\n" unless $req;
The config structure was empty because it could not be read due to the earlier problem, so the error message simply says “Unable to contact web server”. So now you are led to believe that the problem is with some unspecified web server. How much time will you waste trying to track that down?
So which is worse, “scared” or confused and frustrated?
To kick off my error message “hall of shame” series, I thought I should share my all-time favorite. I got this one many years ago, I was minding my own business and suddenly this pops up in the middle of my screen:
I did not have the presence of mind to take a screenshot back then, so this is “faked” from memory, but all the essentials are here: An empty title bar, so I have no idea which program generated the error, the “unknown error” deepens the mystery and the “ok” button serves as a cruel, taunting punchline.
I never figured out which program issued this error, everything seemed to continue normally. Great mysteries, indeed.
There’s an old joke told many years ago by those who didn’t like Unix:
Ken Thompson has an automobile which he helped design. Unlike most automobiles, it has neither speedometer, nor gas gauge, nor any of the numerous idiot lights which plague the modern driver. Rather, if the driver makes any mistake, a giant “?” lights up in the center of the dashboard. “The experienced driver”, he says, “will usually know what’s wrong.”
I’m sure the early versions of ed inspired this. Though in those days, when every byte counted, a certain level of terseness was understandable. And the software was simple enough that there were a limited number of things which could be going wrong.
But now our computers are orders of magnitude bigger and more complicated. We have layer upon layer of drivers, libraries and applications, which nobody can understand in their entirety. And we still have a giant “?” lighting up on our dashboard. The combination of sloppy (or nonexistent) error handling and poor error reporting, means that we all encounter incomprehensible or meaningless out-of-context error messages on a regular basis. Increasingly, I feel that this is the key problem with computers these days: we expend much of our time, energy and morale to the struggle of figuring out what the latest incomprehensible error message means.
Therefore, I will be devoting some time here to cataloging terrible error messages I run into and some of the bad programming practices that lead to them. I thought I should provide some warning (and context) before I vent my spleen.