Steve Freeman Rotating Header Image

Test-Driven

Java synchronisation bug on OS/X?

I’ve come across what might be a synchronisation bug while working on the book.

The end-to-end tests for our example application use the WindowLicker framework to drive the Swing user interface. Our test infrastructure starts the application up in another thread (it’s as close as we can get to running from the command line), then creates a WindowLicker driver which, eventually, creates a Java AWT Robot. It turns out (we think) that this means that we have two threads trying to load and initialise the AWT library in parallel, which hangs. Our workaround is to call a delaying method before creating the WindowLicker Driver:

private void 
makeSureAwtIsLoadedBeforeStartingTheDriverOnOSXToStopDeadlock() {
  try {
    SwingUtilities.invokeAndWait(
      new Runnable() { public void run() {} });
  } catch (Exception e) {
    throw new Defect(e);
  }
}

That’s not really what invokeAndWait() is for, but it solves our problem until we can find a better answer, and we hope that the hack is at least self-explanatory.

Does anyone have a better explanation or fix? OS/X 10.5.6, Java 1.5.0_16, White MacBook 2.4 GHz Intel Core Duo. Nat‘s Linux installation works fine.

Mock Roles not Objects, live and in person.

At the recent Software Craftsmanship conference in London, Willem and Marc ran a session on Responsibility-Driven Development with Mocks for about 30 people. Nat Pryce and I were sitting at the back watching and occasionally heckling.

The first striking thing was that when Willem and Marc asked who was using “Mock Objects” most everyone put their hand up (which was nice), but then only a handful also said they were thinking about Roles and Responsibilities when they did (which was frustrating). We first wrote up these ideas in our paper “Mock Roles Not Objects” and much of the difficulty we see people have with the technique of Mock Objects comes from focussing on classes rather than relationships.

As it happens, an example popped up in the rest of the session, which was run as a Coding Dojo. What was interesting to me was how the group managed to turn around its design ideas. Here’s what I can remember about how it worked out.

The domain was some kind of game, with a hero who moves around an environment slaying dragons and so forth. The first couple of stories were to do with displaying the current room, and then moving from one room to another. It was a little difficult getting started because the limitations of the event didn’t allow enough time to really drive the design from outer-level requirements, but the group managed to get started with something like:

describe Hero do
  it "should describe its surroundings" do
    hero = Hero.new(room)  
    
    room.stub!(:description).and_return("a room with twisty passages")
    
    console.should_receive(:show).with("in a room with twisty passages")
    hero.look(console)
  end
end

The expectation here says that when looking, the hero should write a text describing the room to the console. This was a place to start, but it doesn’t look right. Why is a hero attached to a room? And hero.look(console) just doesn’t read well, it’s hard to tell what it means. The tensions became clearer with the next feature, which was to have the hero move from one room to another. If we write

hero.move_to(other_room)

how can we tell that this has worked? We could ask the hero to look() again, but that means making an extra call for testing, which is not related to the intent of the test. We could ask the hero what his current room is, but that’s starting to leak into Asking rather than Telling. There may be a need for the hero to hold on to his current location, but we haven’t seen it yet.

Suddenly, it became clear that the dependencies were wrong. We already have a feature that can be told about the hero’s situation, which we can build on. If the feature were to be told about what is happening to the hero, we could use that to detect the change in room. So, our example now becomes:

describe Hero do
  it "should move to a room" do
    hero = Hero.new(console)  
    
    room.stub!(:description).and_return("a room with twisty passages")
    
    console.should_receive(:show).with("in a room with twisty passages")
    
    hero.move_to(room)
  end
end

That’s better, but it’s not finished. The term Console sounds like an implementation, not a role. Most of the sword-wielding adventurers that I know don’t know how to work a Console, but they’re quite happy to tell of their great deeds to, say, a Narrator (as David Peterson suggested). If we adjust our example we get.

describe Hero do
  it "should move to a room" do
    hero = Hero.new(narrator)  
    
    room.stub!(:description).and_return("a room with twisty passages")
    
    narrator.should_receive(:says).with("in a room with twisty passages")
    
    hero.move_to(room)
  end
end

The whole example now reads as if it’s in the same domain, in the language of a D&D game. It doesn’t refer to implementation details such as a Console—we might see that code when we get to the detailed implementation of a Narrator. Obviously, there’s a lot more we could do, for a start I’d like to see more structured messages between Hero and Narrator, but the session ran out of time at about this point.

Some lessons:

  1. Naming, naming, naming. It’s the most important thing. A coherent unit of code should have a coherent vocabulary, it should read well. If not, I’m probably mixing concepts which will make the code harder to understand and more brittle to change than it needs to be.
  2. When I’m about to write a test, I ask “if this were to work, who would know”. That’s the most revealing question in B/TDD. If there’s no visible effect from an event, except perhaps for changing a field in the target object, then maybe it’s worth waiting until there is a visible effect, or maybe there’s a concept missing, or maybe the structure isn’t quite right. Before writing more code, I try to make sure I understand its motivation.

Willem’s (and many other people’s) approach is slightly different. He likes to explore a bit further with the code before really sorting out the names, and he’s right that there’s a risk of Analysis-Paralysis. I do that occasionally, but my experience is that the effort of being really picky at this stage forces me to be clearer about what I’m trying to achieve, to ask those questions I really ought to have answers to, before I get in too deep.

This man is corrupting the next generation….

Cay Horstmann, Professor of CS at San Jose State University, Sun Java Champion, and consultant in Internet Programming, says

I perform an occasional unit test after I’ve encountered a failure that I don’t want to have recur, but I rarely write the tests first. If so many experienced developers don’t write unit tests, what does that say? Maybe they would be even better developers if they followed Heinz’s advice. Maybe they don’t make many mistakes that unit tests would catch because they’re already experienced. The truth is probably somewhere in between.

What that says is that we work in a horribly inefficient industry where too many developers spend their time fixing bugs (using the debugger) sent back upstream by the testers, and it looks like there’s evidence to prove it.

via Kerry Jones

Tips on Test-Driven Development

Just found this useful guide

Anyone new to [TDD] should begin with a partner who is more experienced, not only because it is safer but also because one can learn from the other’s experience.
Before starting the [TDD], it is important to have a general picture of the route and how it should be [TDD]‘d. If the [TDD] is long or has a crux, you should try to pick out in advance suitable spots to rest.
Three-point contact with the [code] is an important rule in [TDD] and should be followed where possible. […] You should always have eye contact with your next [refactoring] before you start your move. Scrabbling blindly about for a [refactoring] is a waste of energy. Once started, each move should be short, decisive and smoothly executed. Long moves will tire you out quickly and beginners need to spare strength for the final section of the [TDD].

From p. 17 “Rock Climbing Basics”, Johnston and Hallden, Stackpole Books, 1995

TDD: fewer bugs to production, longer to write

This paper1 from Microsoft’s Empirical Software Measurement group reports that

Case studies were conducted with three development teams at Microsoft and one at IBM that have adopted TDD. The results of the case studies indicate that the pre-release defect density of the four products decreased between 40% and 90% relative to similar projects that did not use the TDD practice. Subjectively, the teams experienced a 15-35% increase in initial development time after adopting TDD.

Reading the small print, these were not “pure” TDD teams since they did a lot of the requirement and design up-front. Still, a nice data point that suggests that, taking the cost of fixes into account, it’s worth taking the time to get it right.

1) “Realizing quality improvement through test driven development: results and experiences of four industrial teams” Nachiappan Nagappan, Michael Maximilien, Thirumalesh Bhat, Laurie Williams. Empirical Software Engineering Journal, Volume 13, Number 3, pp. 289-302, Springer 2008

The reactionary voice of TDD

So, by one reading of Roy’s posting, I appear to have become the reactionary voice of TDD, a sort of technical William F. Buckley, which is not something I aspire to.

The question is how to get all those developers to do TDD when, apparently, they don’t know the basics of software design. It seems the answer is to separate the two, to teach unit testing while ignoring the issue of code quality in the expectation that people will catch up in the end. I absolutely agree that it’s important not to overload students, especially after reading this paper1, but this solution doesn’t make sense to me.

My experience is that unit testing ill-structured code is an adventure in software callisthenics, requiring feats of special deviousness; Michael Feathers wrote a whole book about how to do it. Tools that slice through the runtime may sometimes be necessary, but dangerous power tools should only be used by trained staff2. This is fixing the wrong problem.

My suggestion is to start by improving the design skills of your team, that’s your biggest problem. There’s so much material available, all well-established; for example, you could buy a copy of Object Design and some index cards. There’s a catch, of course, in that this is not a shiny new technique so funding for actually training staff is harder to justify, but you know how to work around that.

In the meantime, write an automated build and some acceptance tests around the system—not for everything, but just enough to show that the application will do the basics when you start it up. For many teams, this would be a significant advance. When this has bedded in and the the team has begun to write objects that tell each other what to do, it will be time to move to a higher astral plane (my shamanic rates are quite reasonable) and start the first stumbling steps in TDD.

The alternative is to do TDD without understanding and, from what I’ve seen, that ends up with exactly what some people complain about, twice as much code for no benefit. My guess is that a brittle, incomprehensible tests suite will not survive the first deadline crisis.

The irony of all this is that, despite my apparently being part of a consultants’ cabal that is protecting its elitist interests3, this approach is likely to be cheaper since material on good OO design is a commodity nowadays and deferring the adoption of full-scale TDD will be less disruptive.



  1. via Mark Guzdial’s excellent blog
  2. I love this quotation from the article: And the number one most dangerous power tool in your wood shop is YOU.
  3. Is this where I turn into Obama?

“Hammers considered harmful”

Here’s another post on the lines of: “Hammers considered harmful. Every time I use one, it strips the threads from my screws.” One of the clues is in the list of symptoms at the end of the first paragraph: “mammoth test set-ups”. The tests were complaining but not being heard.

In truth, we’ve done a dreadful job of explaining where interaction-based techniques are relevant and where they aren’t. I keep bumping into codebases that are supposed to be written that way but where the unit tests have baroque, inflexible setups because the team weren’t listening to the tests. I even saw Lasse Koskela, who knows what he’s doing, during a programming demo at the recent Agile Conference, slip into writing expectations for a simple clock object that should have just returned a series of values; J.B. Rainsberger, being more forthright than me, called him on it.

Romilly, one of my partners in crime collaborators, once said he was surprised when he started working with Nat and me, how simple our unit tests are and how few expectations we set. That degree of focus is one of the points we try to get across whenever we talk about our approach. I find that the best use of interaction-based testing is to nudge me into thinking about objects and their relationships, not as a single solution for all my TDD needs.

In the meantime, we’re working on making our ideas more accessible.

Another reason for licensing programmers

Update: The paper has now been officially published at http://www.computer.org/portal/web/computingnow/0110/whatsnew/software



Stuart Wray has posted a draft of an interesting paper on How does Pair Programming Work?. I can’t judge all the psychological claims, but many of the points appeal to my confirmation bias.

In “Mechanism 3”, Wray makes a link between code-n-fix programming and a form of “operative conditioning” otherwise known as gambling. The academically respectable quotation comes from Gleitman et al.:

In a [Variable Ratio] schedule, there is no way for the animal to know which of its responses will bring the next reward. Perhaps one response will do the trick, or perhaps it will take a hundred more. This uncertainty helps explain why VR schedules produce such high levels of responding in humans and other creatures. Although this is easily demonstrated in the laboratory, more persuasive evidence comes from any gambling casino. There, slot machines pay off on a VR schedule, with the “reinforcement schedule” adjusted so that the “responses” occur at a very high rate, ensuring that the casino will be lucrative for its owners and not for its patrons.

This matches some of my experience programming, and is a pretty good explanation of some of the behaviour I see around me: the continual hope that the next fix will be the right one, the explosions of frustration when something doesn’t work, and the endless sessions. We’ve even strengthened the addictiveness with our taste for darkened rooms with flashing lights (like casino floors), and our highly responsive IDE’s.

Wray hypothesises that Pair Programming may help to manage this addiction by keeping us honest. I suspect that Test-Driven Development also helps by smoothing the flow, removing the Variable Ratio effect, and, as I wrote before, by breaking us out of the programming Tar Pit long enough to refocus on the real goal.

Perhaps we programmers should be licenced, not for the sake of our unlucky customers but to preserve our own health.

Test-Driven Development. A Cognitive Justification?

It’s been a busy week. Michael Feathers has an interesting post on the nature of Test-Driven Development, to which Keith has responded. I think Michael overstated my position on “most” people (it was probably a bar discussion) but over the years I’ve seen a lot of TDD code that doesn’t look right. Incidentally, Tim Mackinnon, who was there, tells the origin of Mocks story at the bottom of this page.

With that out of the way, I’d like to get to the real point of this posting

A Cognitive Justification for Test-Driven Development

Two influences coincided for me at XP2008 this week: Dave Snowden talking about social complexity, including current understanding of the how the mind works, and Naresh Jain pairing to understand different people’s approaches to Test-Driven Development.

Dave has spent a lot of time exploring how decision-making happens. In particular, it turns out that people don’t actually spend their time carefully working out the trade-offs and then picking the best option. Instead, we employ a “first-fit” approach: work through an ordered list of learned responses and pick the first one that looks good enough. All of this happens subconsciously, then our slower rational brain catches up and justifies the existing decision—we can’t even tell it’s happening. Being an expert means that we’ve built up more patterns to match so that we can respond more quickly and to more complicated situations than a novice, which is obviously a good thing in most situations. It can also be a bad thing because the nature of our perception means that experts literally cannot receive certain kinds of information that falls outside their training, not because they’re inadequate people but because that’s how the brain works.

Part of Dave’s practice is concerned with breaking through what he calls this “Expert Entrainment”. He has developed exercises to shuffle our list of response patterns and allow other ideas to break through the crust of skills we’ve worked so hard to acquire. One motivation for doing this is to stop experts jumping to a known solution when they haven’t really understood the situation.

Naresh, meanwhile, is on a mission to pair program with the world to understand how different people approach Test-Driven Development, with an example problem that he uses with everyone. My preference these days is to start with a very specific example of the use of the system and then, as I add more examples, extract structure by refactoring. As we talked this through, Naresh described another programmer who noticed that the problem was an instance of a more general type of system and coded that up directly, there was nothing in his solution that included the language of the example. The other programmer had used his expertise to recognise an underlying solution and short-circuit the discovery process—that’s why we claim higher rates for experience. This programmer was right about his solution, so why did the leap to a design bother me (apart from my own Expert Entrainment)?

Then it struck me, Test-Driven Development, at least as practised by the school that I follow, progresses by focussing on the immediate, on addressing narrow, concrete examples. Don’t worry about all those ideas buzzing around your head for how the larger structure should be, just make a note and park them. For now, just do something to address this little concrete example. Later on, when you’ve gathered some empirical evidence, you can see if you were right and move the code in that direction.

I think what this means is that Test-Driven Development works (or should do) by breaking our first-fit pattern matching. It stops us being expert and steam-rolling over the problem with, literally, the first thing that came into our minds. It forces us out of our comfort zone long enough to consider the real requirements we should be addressing. Even better, starting with a test forces us to think first about the need (what’s the test for that?), and then about a solution that our expert mind is so keen to provide.

Just in case you missed that (and it took me a while to see it), it makes a cognitive difference whether you write the tests first or the code.

The best supporting evidence is Arlo Belshee’s group that implemented Promiscuous Pairing. They found empirically that they were most productive when switching pairs every couple of hours, contrary to what anyone would expect; their view was that were taking advantage of constantly being in a state of “Beginner’s Mind”. Of course, to make TDD work in practice, we still need all that expertise underneath to draw on but to support, not to control.

Personally, I’m constantly surprised at the interesting solutions that come up from being very focussed on the immediate and concrete, with a background awareness of the larger picture. By letting go, I discover more possibilities. Very Zen.

Doubtful metaphors (i)

TDD is Keyhole surgery for software

Whilst writing up an extended example of TDD, I was trying to be as incremental as possible, adding tiny little slices of behaviour all the way through the system: replace one component, get that working; show a connection established, get that working; show one field on the UI, get that working. I took the trouble to structure the changes so that the application was working nearly all the time, rather than having to rip it apart.

Before I learned how to make progress is such fine slices, I would have cracked open the whole codebase and made the changes in one sustained effort. That would have meant that I couldn’t stop until I finished, I couldn’t check in without branching, and that merging with rest of the team would be unpleasant. Lots of coders talk about this in terms of “open-heart surgery”.

So maybe keyhole surgery is a better ambition as a metaphor. All the invasive work is focussed on the part that actually matters, rather than having to open up a route to get there. It takes a little more effort, but it’s less damaging to the patients who recover much more quickly, so the procedure as a whole is safer and cheaper.


Jerry Weinberg has a post, where he writes about how “surgery” may a better term for what many software teams do than “maintenance”. It gives a better sense of the risks involved and why quick solutions implemented by juniors may not be the best approach.

Medical joke from when I shared a house with a medic. “How can you spot the laparoscopists at the pub?”, “They drink their beer through a straw.”