Random Thoughts: Measuring Technical Debt

This is a rambling-about-random-thoughts post. It goes a little out there. Have fun!

This post was triggered by the Lean Code talk at #xp2017, by Desmond Rawls (@okokillgo)

Using Lean Startup principles to improve code quality. Nick was using the normal code quality metrics for this. That is good, but: not all that actionable, and not showing directly what (I think, and he also seemed to say) the goal is: code that can easily be changed to adapt to business demands.

Nick also put the developer as the end-user of the code. I don’t think that is the only way to look at it. The business, or better, the customer, are the ones that require the code to be as easily and cheaply changeable as possible.

This is going to be of limited use, probably, because not many teams are using BDD, including the ubiquitous language, in a structured way. Also, no idea of this is feasible at all, and whether the data would show anything interesting, but I kind of like the idea, so here goes.

If the language used in the formulation of User Story, and in particular, Acceptance Criteria, is descriptive of the Domain, then that language should also be reflected in the design of the code.

In the original definition of Technical Debt, as Ward Cunningham defined it, he posited it as the decision to accept that your code would not reflect a change in (the understanding of) your domain yet, because the change in the structure of the code was deemed too expensive at that point. Turning that around, the way to limit Technical Debt would be to have changes to reflect changes in the domain be as cheap as possible. I’d propose that ‘good design’ would be design that accomplishes that.

So, assuming all of that: having Acceptance Criteria in place in a code base (cucumber feature files with scenarios), implementation of those scenarios that call the domain classes (not absolutely necessary, but it would be a great way to link domain and code), and domain classes that use the ubiquitous business language as used in the scenarios.

If we’d do a frequency analysis of the words in the scenarios (excluding couple words, generic words), and perhaps how closely together words appear to identify relations between concepts, we should be able to generate a graph of the domain.

The same can be done for the code, where we can identify not just the keywords used, but also if they are used as names of classes, objects, methods, or variables, giving us more information about their importance in the design/domain, and their relation.

It would be interesting to see if those two graphs are the same. And whether the analysis would show the same distribution of use of keywords.

Then, if there is a change, we can see whether that change introduces new concepts (ie. a new product type ‘swap’, next to product type ‘spot’ both a type of ‘trade’), and if that is the case how that new concept is then reflected in the code.

If a new concept means introduction of one or more new classes, then we can measure that, and give it a score. If it means changes in one or more existing classes, that could indicate a higher than desirable level of coupling, and would also get a score.

These scores could be added, and together might give us an indication of the changeability of the code, and thus a score on the design. This would give us a way to compare the relative merits of two designs, perhaps even across domains. If we notice that changes to parts of the domain are more expensive than others, we could trigger an investigation to see what parts of the code need change, and how the design there could be improved to avoid that.

Would this also help make a more actionable metric? Maybe. The change in the feature files would precede the change in the codebase. This means we can give a score the the domain change before we start on the code change, giving an indication of the complexity of the change. When we do an API first approach (feature glue code first, generate new classes and methods with an empty implementation second), that could already show us if the change to the design is more complex than necessary, and allow us to refactor then (make the change easy, before making the easy change).

Touching other files would start incrementing our design score, and might trigger another type of refactoring. All depending on short cycles and frequent commits/check-ins.

It seems like a cool idea to me. Please tell me all the ways in which this will never work!