Usefulness Ranking of Code Metrics

01.07.2013 Arnold Franke

Static code analysis is one of the more controversial fields of software engineering. “Misleading Bogus!” screamers and “Must not work without it!” pleaders are bashing their heads in like survivors of a zombie war. My contribution to this argument is an attempt to evaluate the usefulness of different code analysis figures.

Since I mostly work in Java projects with Sonar as main analysis tool, my ranking is centered on this environment. Some of the mentioned metrics don’t even exist outside of Sonar. Nonetheless much of it should be easily transferable to other programming languages or software design in general.

Of course this ranking is highly subjective on my personal experience and only partly informational, since most developers already know the meaning of all the code metrics. More than that I hope to trigger a discussion about their usefulness and importance.

First things first: What makes a code metric useful?

It outright shows a violation of or deviation from the defined best practice.
It hints to a place in your source code that has design flaws.
It shows that a certain aspect in your project is highly neglected and needs to be worked on.
You can quantify it by telling a “normal” or “good” value and react on it when this norm is violated

If any of these is true, a metric can be considered somewhat useful. So let’s dive into it and have a look at the different metrics, starting with the most useful ones:

1. Cyclomatic Complexity

The cyclomatic complexity of classes and methods turns out to be my favorite code metric. It almost always hints to flawed design because too much decision logic is crammed into one method or class. The code often is not unit tested properly because complex units are very difficult to test – every different execution path in the code flow should have its own unit test. In general you should strive for low complexity in every global and local scope of your project, which makes the cyclomatic complexity a very important and useful measurement.

2. Duplications

Avoiding code duplication is a major topic of every code design author and that is for good reason. The corresponding metric plain and simple points out your duplicated code and by doing so gives you opportunities to get rid of serious error sources and to reduce future work. Often it tells you that a new layer of abstraction is needed or that you have to rethink your module/package/class structure to centralize the duplications. Without the duplication metric these places are very difficult to find.

3. Rules Compliance (Sonar)

The Sonar Rules Compliance (RC) shows the relative amount of coding rules violations in your project. Basically it runs static code analysis with PMD, Checkstyle, Findbugs etc.

Improving the RC forces the developers to learn how to avoid rules violations, thus improving the code quality. As a side effect different developers are forced to use the same coding style by following the same rules. Other than that, the RC is a good measurement to get a rough impression of the overall code quality of a project because so many different violations contribute to it. This also makes it a good instrument of comparison. You can compare the RC of different projects to get a rough idea of their relative code quality.

Drilling down the specific violations sometimes hints to design flaws, although often the interesting violations are not easily identifiable in the mass of unimportant ones.

4. Package Tangling

In the first place the package tangle index and similar metrics show you cyclic dependencies, which are always bad. In addition it can identify dependency magnets like util packages that are used all over the project, which makes changes on them quite difficult.

5. LCOM4

The Lack of Cohesion of Methods tells you how much the methods inside a class belong together by measuring if they use the same members of the class. An LCOM higher than one often leads you to a violation of the Single Responsibility Principle.

6. Lines of Code

Wait, what? Lines of code is not at the end of the list? Isn’t that just that bogus number, which tells us absolutely nothing and encouraged developers in the past to produce crap because they were paid by lines of code?

Well, on the one hand this is true – on the other hand it isn’t that useless in my opinion. If you have a look at the amount of lines of code broken down by class or method you will almost always find a badly designed piece of code at the top of the list. Most of the times the largest class in a project is the “black sheep”, which every developer fears changing and where redesign is needed the most. Lines of Code per class helps you identifying it.

7. Sonar Quality Index

Sonar’s Quality Index tries to merge several other Indexes into one number to give an overall indicator for code quality. It doesn’t do a very good job though, because the formulas and weightings behind it are not really intuitive, which makes it a pretty intransparent and meaningless measurement. You can use it to roughly compare projects with one another but it won’t really help to increase your code quality.

8. Sonar Complexity Factor

The Sonar Complexity Factor is so far down the list because it is always zero. Always! You say I am lying and you have seen it above zero? Well, then measuring the code quality of your project is one of your lesser problems. The Complexity Factor only rises above zero, when you have a cyclomatic complexity of 31 or more somewhere in your code. That means a method with 31 or more different execution paths. That’s the kind of code you don’t want to change anymore, let alone fix it. You just wanna release it from its pain and throw it away. And a metric that only shows something, when the game over screen is already flashing in front of you, doesn’t help at all.

9. Lines of Comments

Counting lines of comments to evaluate your code is one of the worst ideas I’ve heard. Sonar for example even tells you that it is good to have more lines of comments – whaaaat? Didn’t we learn in Uncle Bob’s Clean Code that “comments are not like Schindler’s list, they are not pure good”?

Yes, you should describe your API properly with Javadoc. But you should try to reduce the comments describing your code, the code should describe itself. These opposing goals make it impossible to tell a “good” amount of comments, thus making this metric completely useless.

OK, that’s it. I hope I could give you a little insight to the usefulness of some code metric numbers or at least got a “bullshit, that guy has no clue” out of you to animate you for discussion.

One more thing: In my opinion a newly started greenfield project should try to keep the first five of the list at optimum (duplication and dependency cycles at 0, LCOM at 1, complexity near 1, Rules Compliance at 100%), which is not impossible and not only gives you a very good feeling about your code but also saves you a lot of work in the long run.