25 files of about half an hour each with the most famous interview made to the suspense master. In the fall of 1962, François Truffaut carried out extensive…
We discuss some downsides of testing and TDD: can you do too much testing, and is there a problem with teams valuing tests more than they value the functional code?
David starts by saying "to talk about trade-offs, you really have to understand the drawbacks, because if there are no drawbacks there are no trade-offs." He continued by saying that TDD doesn’t force you to do things, but it does nudge you in certain directions. The first issue he wanted to raise was over-testing. It’s often said you shouldn’t write a line of code without a failing test, at first this seems reasonable but it can lead to over-testing, such as where there are four lines of test code for every line of production code. This means that when you need to change behavior, you have more code to change . Kent has said ‘you aren’t paid to write tests, you just write enough to be confident’ - so he asked if Kent and I wrote tests before every line of production code?
Kent replied "it depends, and that’s going to be the beginning to all of my answers to any question that’s interesting". With JUnit they were very strict about test-first and were very happy with how it turned out - so he doesn’t think you always get over-testing when you use TDD. Herb Derby came up with the notion of delta coverage - what coverage does this test provide that’s unique? Tests with zero delta coverage should be deleted unless they provide some kind of communication purpose. He said he’d often write a system-y test, write some code to implement it, refactor a bit, and end up throwing away the initial test. Many people freak out at throwing away tests, but you should if they don’t buy you anything. If the same thing is tested multiple ways, that’s coupling, and coupling costs.
I said that I’m sure there is over-tested code, indeed if anyone does it would be ThoughtWorks since we have a strong testing culture. It’s hard to get the amount just right, sometimes you’ll overshoot and sometimes undershoot. I would expect to overshoot from time to time and it’s not something to worry about unless it’s too large. On the test-every-line-of-code point I ask the question: "if I screw up this line of code is a test going to fail?" I sometimes deliberately comment a line out or reverse a conditional and run the tests to ensure one fails. My other mental test (from Kent) is only test things that can possibly break. I assume libraries work (unless they are really wonky). I ask if I can mess up my use of the library and how critical are the consequences of the mistake.
Kent declared that the ratio of lines of test code to lines of production code was a bogus metric. A formative experience for him was watching Christopher Glaeser write a compiler, he had 4 lines of test code for every line of compiler code - but this is because compilers have lots of coupling. A simpler system would have a much smaller ratio. David said that that to detect commenting out a line of code implies 100% test coverage. Thinking about what can break is worth exploring, Rails’s declarative statements don’t lead to enough breakage to be worth testing, so he’s comfortable with significantly less than 100% coverage.
I replied that "you don’t have enough tests (or good enough tests) if you can’t confidently change the code," and "the sign of too much is whenever you change the code you think you expend more effort changing the tests than changing the code." You want to be in the Goldilocks zone, but that comes with experience of knowing what mistakes you and your team tend to make and which ones don’t cause a problem. I said I like the "can I comment out a line of code" approach when I’m unsure of my ground, it’s a starting place but as I work more in an environment I can come up with better heuristics. David felt that this tuning is different between product teams that are stable rather than consulting teams that are handing the code over to an unknown team and thus need more tests. Kent said that it’s good to learn the discipline of test-first, it’s like a 4WD-low gear for tricky parts of development.
David introduced the next issue: many people used to think that documentation was more important than code. Now he’s concerned that people think tests are more important than functional code. Connected with this is an under-emphasis on the refactor part of the TDD cycle. All this leads to insufficient energy to refactoring and keeping the code clear. Kent described that he just went through an episode where he threw away some production code, but keeping the tests and reimplementing it. He really likes that approach as the tests tell him if the new code is working. This leads to an interesting question: would you rather throw away the code and keep the tests or vice-versa? In different situations you’d answer that question differently.
I said I’d found situations where reading the tests helped me understand what the code was doing. I didn’t think one was more important than the other - the whole point is the double check where there is an error if they get a mismatch. I agreed with David that I’d sometimes sensed teams making the bad move of putting more energy into the testing environment than in supporting the user, tests should be means to the end. I find I get a dopamine shot when I clarify code, but my biggest thrill is when I have to add a feature, think it will be tricky, but it turns out easy. That happens due to clean code, but there is a distance between cleaning the code and getting the dopamine shot. Kent showed a metaphor for this from Jeff Eastman, that is too tricky to describe in text. He got his rush from big design simplifications. He feels that it’s easy to explain the value of a new test working, but hard to state the value of cleaning the design.
David said we often focus on things we can quantify, but you can’t reduce design quality to a number - so people prioritize things that are low on the list like test speed, coverage, and ratios. These things are honey traps, and we need to be aware of their siren calls. Cucumber really gets his goat - glorification of a testing environment rather than production code. Only useful in the largely imaginary sweetspot of writing tests with non-technical stakeholders. It used to be important to sell TDD, but now it’s conquered all, we need to explore its drawbacks. I disagreed that TDD was dominant, hearing many places where it’s yet to gain traction.
David feels that using TDD leads to approaches such as hexagonal rails that is test-induced design damage due to the complexity of excessive indirection. Kent thinks it’s less about TDD and more about the quality of design decisions.
I begin by opening with questions. Can TDD lead to design damage? Is the resulting damage really damage? How do we judge if a design is damaged or not? David describes the gist he posted earlier. It’s an example of the kind of architecture he sees people arriving at with TDD with when using lots of mocks. Each layer of the application is separated, eg a controller can be tested without talking to real models, databases, or the request/response cycle. What matters to David isn’t the specific example, so much as the unnecessary indirection and complexity required to make easier to test in isolation.
Kent said that ascribing test-induced damage to TDD was like driving a car to a bad place and blaming the car for it. The design David showed wasn’t due to TDD, the real issue is that these indirections are all good tricks under some circumstances and we need to understand whether they are worth the cost or not. David disagreed, saying that once you jump on the TDD horse (or car) it encourages you to go a certain way - it leads to a monstrosity one test at a time. Kent countered that it was rather one design decision at a time. TDD puts an evolutionary pressure on a design, people have different preferences for the grain-size of how much is covered by their tests.
Kent asked David what kinds of thing he wanted to do with the gist that its structure made hard. ("If it’s just sitting there who cares - it’s when I want to change it that the design actually matters"). David replied that there’s a direct correlation between the size of code and how easy it is to change it. All these indirections have to be kept in sync, something that’s 10 lines of code is easier to understand and change than something that’s 60 lines of code. Every layer of indirection introduces a high cost. David continued by saying that TDD’s red/green/refactor flow was very addictive (Kent observed that he’s the poorest drug dealer on the planet) and this addiction led people to these poor decisions. I disagreed with this, saying it wasn’t due to TDD but due to a desire for isolation, the essence of a hexagonal architecture being isolation from its environment (in this case Rails).
David said the reason people wanted isolation was due to TDD, he’d heard various arguments for isolation, but only the testing one made sense. He thinks the idea that someone wants to turn a rails application into a command-line application is so rare it’s laughable, similarly you can’t just swap out an in-memory store for a call to a web service because they have different operational characteristics. These swapability pipedreams aren’t the real goal - the real goal is isolated testing. Kent agreed that you can’t treat in-memory and web services the same ("you may think you’re decoupled, but you’re really, really not") as the failure cases are different. The boundaries between elements will leak to some degree "the question is how much are we willing to spend to get how much decoupling between elements".
Kent saw the difference between 10 lines of code and 60 as a cohesion issue. David agreed but argued that cohesion and coupling are often opposed. Higher coupling is usually worth the price to get better cohesion. Kent observed that there are other ways to eliminate external dependencies, you can also use intermediate results, this is what happens with compilers. Something that’s hard to test is an indication that you need a design insight, it’s often useful to get up and take a walk to find those insights that lead to better designs that are also more testable. David agreed that testing can lead to better designs, but said his experience often was also the opposite, that there wasn’t a good testable design. Kent accused David of not having enough self-confidence, maybe you can’t see the insight today, so you have to make progress in the meantime, but he’s optimistic that he will find them eventually. David dismissed this as "faith-based TDD" - he used to feel this but got stuck in a depressing loop when he wasn’t finding an ideal solution that wasn’t there. Kent clarified he wasn’t talking about TDD, but about software design in general, it’s not about TDD it’s about how to get feedback. Thinking about software design is the thing, because it pays off so big when you get a good design insight. Getting these insights isn’t about your workflow, it’s about things like knowing when to work and when to rest, gathering influences from other places, collaborating with other people.
We finished by saying our theme next time would be to explore the trade-offs around how you seek out feedback while programming.
We discuss the various ways in which we get feedback while programming and the role of QA in providing feedback to developers.
Kent opened by saying that decisions involving TDD were about trade-offs: "in some ideal world we would have instant, infallible feedback about our programming decisions"… "every key stroke that I make, if the code is ready to deploy, it would just instantly deploy." But that ideal is impossible at the moment so the question is how far do we back off from that. He went to enumerate several constraints in the trade-off.
Frequency: how rapidly do we want our feedback ? Fidelity: how accurate do we want the red/green signal to be? Overhead: how much are we prepared to pay? Lifespan: how long is this software going to be around, which is probability as well as time. Those four are the constraints he thinks we need to compare. "We’re not in this hangout to agree - my personal goal is just to understand the set of trade-offs by articulating them to people who are prepared to tear my ideas apart in a constructive way"
I outline three things that we look to get feedback on.
Is the software doing something useful for the user of the software? Sometimes tests help with this (eg payroll calculations) and sometimes not (eg html rendering). Have I broken anything? "This is where self testing code… is such a lifesaver." I want to see every test fail at least once. Is my code-base healthy? This so I can continue to build things quickly. This element gets more tricky when you’re not sure who will take over the code. David introduced the topic that TDD’s success had led to a neglect of QA. Many shops took on TDD and got rid of QA, Basecamp didn’t have QA until a couple of years ago. He thinks TDD got programmers to where "they got so over-confident that they felt they didn’t need QA". While the old model was broken, the pendulum had swung too far: "I don’t think you can work on anything of material quality and produce great software without having somebody who’s not you test it." This is disappointing because he’s seen how powerful it is to have a QA person come in.
The other issue is that to understand trade-offs you have to understand the costs, all the talk of TDD has been on the benefits. This neglect of costs is why people cannot comprehend that there is such a thing as test-induced damage. The trade-off continuum is true of other things. Consider the cost of reliability: going from 99% to 99.999% is exponentially more expensive than getting to 99%. We must also consider criticality. High reliability is important for space shuttles and pacemakers, but wrong for an exploratory web site. The rule of not writing a line of production code without a test doesn’t fit in with trade-offs around criticality.
Kent wanted to go back to the issue of QA, he considered the old relationship with QA was dysfunctional. His one piece of Facebook swag in his office is a poster that says "Nothing at Facebook is somebody else’s problem" and he feels Facebook follows that remarkably well for a company its size. Facebook didn’t have QA until recently and programmers live up to that responsibility. "It’s a question of ‘compared to what?’" Compared to having an effective QA then no-QA is worse, but no-QA is better than the old dysfunctional relationship. I added that at ThoughtWorks we almost always have QA on our projects. I also feel that the big shift from the 90’s is not just getting rid of the dysfunctional adversarial relationship, but also getting rid of manual scripted tests. And it’s liberating that startups can operate without QA. David agreed that it was good to mindfully trade-off QA for initial speed, but some have taken programmer testing too far and don’t see the value of exploratory testing. If developers think they can create high-enough quality software without QA they are wrong, your tests may be green but when it’s in production users do things you don’t expect.
David says that worst of all is when developers are not part of of customer service. Many programmers don’t want to be on-call because it’s drudgery, but it’s also a feedback loop. Code with green tests can be a plateau that’s below where you want to be. Kent considered that we should stipple a few red pixels in the green bar to remind us of these limitations . "The on-call is the feedback loop that teaches you what tests you didn’t write." Facebook programmers have to go on-call, everyone complains about it, but there’s no way they are going away from it. As soon as you think you don’t make mistakes any more, that’s a mistake, and you stop growing. Eventually "the world won’t let you pretend that you’re not screwing up any more." He’d rather pay the price of catching that early with a phone call at 2am.
I finish by observing we didn’t get to talk more about the costs of testing (David’s second point from earlier) so propose that we look at that next time. I also mention that by chance there’s an article by Mike Bland published on my site today that looks at exactly that topic.
We talk about our varying experiences with the flow of TDD, and the way TDD and self-testing code are often confused.
David opened the discussion by raising his three major issues with TDD and Unit Testing: confusion over the definition of TDD and unit testing, test-induced damage through using mocks to drive architecture, and how the red/green/refactor cycle of TDD never worked for him. I commented that to understand where TDD etc came from its useful to understand the history, so Kent explained the origins of TDD by trying things out in Smalltalk, finding that TDD worked well for his personality.
I commented that when we first worked together at C3, we didn’t start using TDD, but ensuring each programming episode delivered code and tests together. Kent said that programmers deserve to feel confident that their code works, TDD is one (not the only) way to reach that. David feels Ruby’s design goal of programmer happiness, and is on board with the notion that you’re not done till you have tests - but doesn’t like TDD as a way to get there. He thinks people have different brains and thus like different techniques and languages, he doesn’t like that TDD gets conflated with the confidence you get from self-testing code.
Kent talked about a recent hackathon at Facebook, about half of which he could use TDD and half wasn’t suitable. In the TDDable code he found he was an enjoyable flow, but found the other part more tricky but still used regression tests and short feedback loops. He has no problem mixing both styles, it’s like playing both classical and jazz, TDD reminds him of how he learned mathematics at school - always needing examples.
David has been in situations where TDD flowed well, but most of his work isn’t like that, his question is what are you willing to sacrifice to get that flow? Many people make bad trade-offs, especially with heavy mocking. Kent thinks it’s about trade-offs, is it worth making intermediate results testable? He uses the example of a compiler where an intermediate parse-tree makes a good test point, and is also a better design. But in response to David’s question about mocks, he says he rarely uses them, he’s concerned that those that do often find refactoring difficult, while he finds testing makes refactoring easier.
I comment that there are two problems with terminology where different things get conflated: first that DHH’s critique of TDD was based on an assumption that you had to use heavy mocking in TDD, which isn’t the case; second that there is a difference between self-testing code and TDD. TDD is one way to achieve self-testing code. David said his reaction was to seeing people describe TDD in a mock-heavy as a moral thing to do and the result was a lot of code that was poorly designed due to its desire to enable isolated unit tests.
I finished by playing time-cop and saying that in the next session we’ll explore how TDD may lead to damage, if that is really damage, and how we can judge if it’s damage.
This interview is 1 of a series of 6 interviews for The Complete Guide to Rails Performance, discussing Ruby and Rails performance with community leaders like Mike Perham, Sean Griffin, Evan Phoenix, Richard Schneeman and more. To view the other interviews and much more, visit https://railsspeed.com to purchase The Complete Guide to Rails Performance.
What started out as a conversation over lunch (https://twitter.com/timoreilly/status/889976206355046403) made its way to our community as a round table discussion between Tim O’Reilly (Founder and CEO of O’Reilly Media), Jason Fried and David Heinemeier Hansson (Founders of Basecamp, CEO and CTO, respectively) and our own Bryce Roberts (co-founder and managing director of O’Reilly AlphaTech Ventures). Basecamp and O’Reilly Media have been in business for nearly 2 and 4 decades, each. Both companies remain independent and wildly profitable to this day as they have built their businesses on their own terms, from day one. In this 1-hour live discussion, the group covered topics ranging from what it’s like to run a tech business outside of Silicon Valley to much of the confusion and clumsiness we introduce to our lives by conforming to someone else’s definition of what running a successful and impactful business looks like.
Dan is joined by David Heinemeier Hansson, Shlok Vaidya, Producer Haddie Cooke to discuss telecommuting vs. being in the office.
We answer a couple of questions from our viewers and wrap up our thoughts on this topic.
Page 1 of 10Older