Steve Freeman Rotating Header Image

Test-Driven

Test-Driven Development is not an elite technique.

This “Darwinian” post that TDD Is Not For the Weak says that not everyone can cope with TDD, it’s for “the Alpha, the strong, the experienced”. I don’t want to believe this, because I think that developers who can’t cope with any level of TDD shouldn’t be coding at all, so I won’t.

This claim has been made for every technical innovation I’ve seen so far (objects, event-driven programming, etc, etc). Sometimes it’s true, but most of the time it’s about what people are used to. Michael Feathers pointed out a while ago that the Ruby community is happily exploiting techniques such as meta-programming that were traditionally regarded as needing a safe pair of hands. What’s changed is that a generation has grown up with meta-programming and doesn’t regard it as problematic. Of course, there will be a degradation in understanding as an idea rolls out beyond its originators, but there’s still some value that gets through.

Sure there’s a role for people to help the generation that is struggling to pick up a new technique, but that doesn’t mean that TDD itself will always be beyond the range of mortal developers.

Keep tests concrete

This popped up on a technical discussion site recently. The original question was how to write tests for code that invokes a method on particular values in a list. The problem was that the tests were messy, and the author was looking for a cleaner alternative. Here’s the example test, it asserts that the even-positioned elements in the parameters are passed to bar in the appropriate sequence.

public void testExecuteEven() {
  Mockery mockery = new Mockery();

  final Bar bar = mockery.mock(Bar.class);
  final Sequence sequence = new NamedSequence("sequence");

  final List<String> allParameters = new ArrayList<String>();
  final List<String> expectedParameters = new ArrayList<String>();

  for (int i = 0; i < 3; i++) {
    allParameters.add("param" + i);
    if (i % 2 == 0) {
      expectedParameters.add("param" + i);
    }
  }
  final Iterator<String> iter = expectedParameters.iterator();

  mockery.checking(new Expectations() {
   {
     while (iter.hasNext()) {
       one(bar).doIt(iter.next());
        inSequence(sequence);
      }
    }
  });

  Foo subject = new Foo();
  subject.setBar(bar);
  subject.executeEven(allParameters);
  mockery.assertIsSatisfied();
}

The intentions of the test are good, but its most striking feature is that there’s so much computation going on. This doesn’t need a new technique to make it more readable, it just needs to be simplified.

A unit test should be small and focussed enough that we don’t need any general behaviour. It just has to deal with one example, so we can make it as concrete as we like. With that in mind, we can collapse the test to this:

public void testCallsDoItOnEvenIndexedElementsInList() {
  final Mockery mockery = new Mockery();
  final Bar bar = mockery.mock(Bar.class);
  final Sequence evens = mockery.sequence("evens");

  final  List<String> params = 
    Arrays.asList("param0", "param1", "param2", "param3");

  mockery.checking(new Expectations() {{
    oneOf(bar).doIt(params.get(0)); inSequence(evens);
    oneOf(bar).doIt(params.get(2)); inSequence(evens);
  }});

  Foo subject = new Foo();
  subject.setBar(bar);
  subject.executeEven(params);
  mockery.assertIsSatisfied();
}

To me, this is more direct, a simpler statement of the example—if nothing else, there’s just less code to understand. I don’t need any loops because there aren’t enough values to justify them. The expectations are clearer because they show the indices of the elements I want from the list (an alternative would have been to put in the expected values directly). And if I pulled the common features, such as the mockery and the target object, into the test class, the test would be even shorter.

The short version of this post is: be wary of any general behaviour written into a unit test. The scope should be small enough that values can be coded directly. Be especially wary of anything with an if statement. If the data setup is more complicated, then consider using a Test Data Builder.

Java synchronisation bug on OS/X?

I’ve come across what might be a synchronisation bug while working on the book.

The end-to-end tests for our example application use the WindowLicker framework to drive the Swing user interface. Our test infrastructure starts the application up in another thread (it’s as close as we can get to running from the command line), then creates a WindowLicker driver which, eventually, creates a Java AWT Robot. It turns out (we think) that this means that we have two threads trying to load and initialise the AWT library in parallel, which hangs. Our workaround is to call a delaying method before creating the WindowLicker Driver:

private void 
makeSureAwtIsLoadedBeforeStartingTheDriverOnOSXToStopDeadlock() {
  try {
    SwingUtilities.invokeAndWait(
      new Runnable() { public void run() {} });
  } catch (Exception e) {
    throw new Defect(e);
  }
}

That’s not really what invokeAndWait() is for, but it solves our problem until we can find a better answer, and we hope that the hack is at least self-explanatory.

Does anyone have a better explanation or fix? OS/X 10.5.6, Java 1.5.0_16, White MacBook 2.4 GHz Intel Core Duo. Nat‘s Linux installation works fine.

Mock Roles not Objects, live and in person.

At the recent Software Craftsmanship conference in London, Willem and Marc ran a session on Responsibility-Driven Development with Mocks for about 30 people. Nat Pryce and I were sitting at the back watching and occasionally heckling.

The first striking thing was that when Willem and Marc asked who was using “Mock Objects” most everyone put their hand up (which was nice), but then only a handful also said they were thinking about Roles and Responsibilities when they did (which was frustrating). We first wrote up these ideas in our paper “Mock Roles Not Objects” and much of the difficulty we see people have with the technique of Mock Objects comes from focussing on classes rather than relationships.

As it happens, an example popped up in the rest of the session, which was run as a Coding Dojo. What was interesting to me was how the group managed to turn around its design ideas. Here’s what I can remember about how it worked out.

The domain was some kind of game, with a hero who moves around an environment slaying dragons and so forth. The first couple of stories were to do with displaying the current room, and then moving from one room to another. It was a little difficult getting started because the limitations of the event didn’t allow enough time to really drive the design from outer-level requirements, but the group managed to get started with something like:

describe Hero do
  it "should describe its surroundings" do
    hero = Hero.new(room)  
    
    room.stub!(:description).and_return("a room with twisty passages")
    
    console.should_receive(:show).with("in a room with twisty passages")
    hero.look(console)
  end
end

The expectation here says that when looking, the hero should write a text describing the room to the console. This was a place to start, but it doesn’t look right. Why is a hero attached to a room? And hero.look(console) just doesn’t read well, it’s hard to tell what it means. The tensions became clearer with the next feature, which was to have the hero move from one room to another. If we write

hero.move_to(other_room)

how can we tell that this has worked? We could ask the hero to look() again, but that means making an extra call for testing, which is not related to the intent of the test. We could ask the hero what his current room is, but that’s starting to leak into Asking rather than Telling. There may be a need for the hero to hold on to his current location, but we haven’t seen it yet.

Suddenly, it became clear that the dependencies were wrong. We already have a feature that can be told about the hero’s situation, which we can build on. If the feature were to be told about what is happening to the hero, we could use that to detect the change in room. So, our example now becomes:

describe Hero do
  it "should move to a room" do
    hero = Hero.new(console)  
    
    room.stub!(:description).and_return("a room with twisty passages")
    
    console.should_receive(:show).with("in a room with twisty passages")
    
    hero.move_to(room)
  end
end

That’s better, but it’s not finished. The term Console sounds like an implementation, not a role. Most of the sword-wielding adventurers that I know don’t know how to work a Console, but they’re quite happy to tell of their great deeds to, say, a Narrator (as David Peterson suggested). If we adjust our example we get.

describe Hero do
  it "should move to a room" do
    hero = Hero.new(narrator)  
    
    room.stub!(:description).and_return("a room with twisty passages")
    
    narrator.should_receive(:says).with("in a room with twisty passages")
    
    hero.move_to(room)
  end
end

The whole example now reads as if it’s in the same domain, in the language of a D&D game. It doesn’t refer to implementation details such as a Console—we might see that code when we get to the detailed implementation of a Narrator. Obviously, there’s a lot more we could do, for a start I’d like to see more structured messages between Hero and Narrator, but the session ran out of time at about this point.

Some lessons:

  1. Naming, naming, naming. It’s the most important thing. A coherent unit of code should have a coherent vocabulary, it should read well. If not, I’m probably mixing concepts which will make the code harder to understand and more brittle to change than it needs to be.
  2. When I’m about to write a test, I ask “if this were to work, who would know”. That’s the most revealing question in B/TDD. If there’s no visible effect from an event, except perhaps for changing a field in the target object, then maybe it’s worth waiting until there is a visible effect, or maybe there’s a concept missing, or maybe the structure isn’t quite right. Before writing more code, I try to make sure I understand its motivation.

Willem’s (and many other people’s) approach is slightly different. He likes to explore a bit further with the code before really sorting out the names, and he’s right that there’s a risk of Analysis-Paralysis. I do that occasionally, but my experience is that the effort of being really picky at this stage forces me to be clearer about what I’m trying to achieve, to ask those questions I really ought to have answers to, before I get in too deep.

This man is corrupting the next generation….

Cay Horstmann, Professor of CS at San Jose State University, Sun Java Champion, and consultant in Internet Programming, says

I perform an occasional unit test after I’ve encountered a failure that I don’t want to have recur, but I rarely write the tests first. If so many experienced developers don’t write unit tests, what does that say? Maybe they would be even better developers if they followed Heinz’s advice. Maybe they don’t make many mistakes that unit tests would catch because they’re already experienced. The truth is probably somewhere in between.

What that says is that we work in a horribly inefficient industry where too many developers spend their time fixing bugs (using the debugger) sent back upstream by the testers, and it looks like there’s evidence to prove it.

via Kerry Jones

Tips on Test-Driven Development

Just found this useful guide

Anyone new to [TDD] should begin with a partner who is more experienced, not only because it is safer but also because one can learn from the other’s experience.
Before starting the [TDD], it is important to have a general picture of the route and how it should be [TDD]‘d. If the [TDD] is long or has a crux, you should try to pick out in advance suitable spots to rest.
Three-point contact with the [code] is an important rule in [TDD] and should be followed where possible. […] You should always have eye contact with your next [refactoring] before you start your move. Scrabbling blindly about for a [refactoring] is a waste of energy. Once started, each move should be short, decisive and smoothly executed. Long moves will tire you out quickly and beginners need to spare strength for the final section of the [TDD].

From p. 17 “Rock Climbing Basics”, Johnston and Hallden, Stackpole Books, 1995

TDD: fewer bugs to production, longer to write

This paper1 from Microsoft’s Empirical Software Measurement group reports that

Case studies were conducted with three development teams at Microsoft and one at IBM that have adopted TDD. The results of the case studies indicate that the pre-release defect density of the four products decreased between 40% and 90% relative to similar projects that did not use the TDD practice. Subjectively, the teams experienced a 15-35% increase in initial development time after adopting TDD.

Reading the small print, these were not “pure” TDD teams since they did a lot of the requirement and design up-front. Still, a nice data point that suggests that, taking the cost of fixes into account, it’s worth taking the time to get it right.

1) “Realizing quality improvement through test driven development: results and experiences of four industrial teams” Nachiappan Nagappan, Michael Maximilien, Thirumalesh Bhat, Laurie Williams. Empirical Software Engineering Journal, Volume 13, Number 3, pp. 289-302, Springer 2008

The reactionary voice of TDD

So, by one reading of Roy’s posting, I appear to have become the reactionary voice of TDD, a sort of technical William F. Buckley, which is not something I aspire to.

The question is how to get all those developers to do TDD when, apparently, they don’t know the basics of software design. It seems the answer is to separate the two, to teach unit testing while ignoring the issue of code quality in the expectation that people will catch up in the end. I absolutely agree that it’s important not to overload students, especially after reading this paper1, but this solution doesn’t make sense to me.

My experience is that unit testing ill-structured code is an adventure in software callisthenics, requiring feats of special deviousness; Michael Feathers wrote a whole book about how to do it. Tools that slice through the runtime may sometimes be necessary, but dangerous power tools should only be used by trained staff2. This is fixing the wrong problem.

My suggestion is to start by improving the design skills of your team, that’s your biggest problem. There’s so much material available, all well-established; for example, you could buy a copy of Object Design and some index cards. There’s a catch, of course, in that this is not a shiny new technique so funding for actually training staff is harder to justify, but you know how to work around that.

In the meantime, write an automated build and some acceptance tests around the system—not for everything, but just enough to show that the application will do the basics when you start it up. For many teams, this would be a significant advance. When this has bedded in and the the team has begun to write objects that tell each other what to do, it will be time to move to a higher astral plane (my shamanic rates are quite reasonable) and start the first stumbling steps in TDD.

The alternative is to do TDD without understanding and, from what I’ve seen, that ends up with exactly what some people complain about, twice as much code for no benefit. My guess is that a brittle, incomprehensible tests suite will not survive the first deadline crisis.

The irony of all this is that, despite my apparently being part of a consultants’ cabal that is protecting its elitist interests3, this approach is likely to be cheaper since material on good OO design is a commodity nowadays and deferring the adoption of full-scale TDD will be less disruptive.



  1. via Mark Guzdial’s excellent blog
  2. I love this quotation from the article: And the number one most dangerous power tool in your wood shop is YOU.
  3. Is this where I turn into Obama?

“Hammers considered harmful”

Here’s another post on the lines of: “Hammers considered harmful. Every time I use one, it strips the threads from my screws.” One of the clues is in the list of symptoms at the end of the first paragraph: “mammoth test set-ups”. The tests were complaining but not being heard.

In truth, we’ve done a dreadful job of explaining where interaction-based techniques are relevant and where they aren’t. I keep bumping into codebases that are supposed to be written that way but where the unit tests have baroque, inflexible setups because the team weren’t listening to the tests. I even saw Lasse Koskela, who knows what he’s doing, during a programming demo at the recent Agile Conference, slip into writing expectations for a simple clock object that should have just returned a series of values; J.B. Rainsberger, being more forthright than me, called him on it.

Romilly, one of my partners in crime collaborators, once said he was surprised when he started working with Nat and me, how simple our unit tests are and how few expectations we set. That degree of focus is one of the points we try to get across whenever we talk about our approach. I find that the best use of interaction-based testing is to nudge me into thinking about objects and their relationships, not as a single solution for all my TDD needs.

In the meantime, we’re working on making our ideas more accessible.

Another reason for licensing programmers

Update: The paper has now been officially published at http://www.computer.org/portal/web/computingnow/0110/whatsnew/software



Stuart Wray has posted a draft of an interesting paper on How does Pair Programming Work?. I can’t judge all the psychological claims, but many of the points appeal to my confirmation bias.

In “Mechanism 3”, Wray makes a link between code-n-fix programming and a form of “operative conditioning” otherwise known as gambling. The academically respectable quotation comes from Gleitman et al.:

In a [Variable Ratio] schedule, there is no way for the animal to know which of its responses will bring the next reward. Perhaps one response will do the trick, or perhaps it will take a hundred more. This uncertainty helps explain why VR schedules produce such high levels of responding in humans and other creatures. Although this is easily demonstrated in the laboratory, more persuasive evidence comes from any gambling casino. There, slot machines pay off on a VR schedule, with the “reinforcement schedule” adjusted so that the “responses” occur at a very high rate, ensuring that the casino will be lucrative for its owners and not for its patrons.

This matches some of my experience programming, and is a pretty good explanation of some of the behaviour I see around me: the continual hope that the next fix will be the right one, the explosions of frustration when something doesn’t work, and the endless sessions. We’ve even strengthened the addictiveness with our taste for darkened rooms with flashing lights (like casino floors), and our highly responsive IDE’s.

Wray hypothesises that Pair Programming may help to manage this addiction by keeping us honest. I suspect that Test-Driven Development also helps by smoothing the flow, removing the Variable Ratio effect, and, as I wrote before, by breaking us out of the programming Tar Pit long enough to refocus on the real goal.

Perhaps we programmers should be licenced, not for the sake of our unlucky customers but to preserve our own health.