Measurability and code quality determination is eternal topic in the programming world. I think all specialists who already have experience with large projects with a long history do not doubt the need to maintain the code in a high-quality condition. But there is not always enough time to find out which characteristics are important in this particular project. This article will not describe how to write and design code and whether spaces around brackets are needed. Today I will try to highlight the most important aspects that should be paid attention to and what they can affect, what are the permissible limits and how to monitor them is up to you.
What is quality code?
There is no precise definition of this term. As a rule, understanding how a high-quality source code should look is based on many years of experience of a specialist. Some programmers adhere to the abstract principle of KISS, which stands for Keep It Simple, Stupid! ("Make it easier, dumbass!"). In part, this design method is fair, as it reflects the main rule of good code — simplicity and clarity. However, simplicity is often confused with simplification, so the quality of the source code in a professional environment is judged by several more properties:
The code is not overloaded with complex constructs, so it is easy to understand even without additional documentation or comments;
escort. It is easy to make changes to a well-thought-out code: change configurations or even platforms;
It's easy to add new functionality to it without the risk of breaking the coding algorithm. Even if there are any problems, they can be quickly fixed;
Good code can be passed on to other developers for support or revision, and they will have no difficulty reading it;
The higher the percentage of code coverage by tests, the more likely it is to avoid unnecessary bugs in the future.
To make it easier to understand the code in a professional environment, each programming language has its own Code Style - a design standard. It is he who dictates the rules: where to put spaces or brackets, how to separate strings or name variables. It may seem that these nuances are not so important, but their observance greatly facilitates the understanding of the code for those who see it for the first time.
Not every programmer can write really good code. This is especially difficult for those who are just gaining experience. But even competent developers can make mistakes from time to time. Therefore, studios that create high-quality software regularly inspect the code.
Compliance with the rules
This clause covers situations when the code is compiled and, in most cases, does its job, while doing it correctly. This is an interesting characteristic, largely due to the fact that the company must first have rules for writing code. You can do it easier and take the work of others (Java Code Conventions, GCC Coding Conventions, Zends Coding Standard), or you can work and supplement them with your own, most suitable for the specifics of your company.
But why do we need rules for writing code if the code does its job? To answer the question, we will highlight several types of rules:
syntactic rules - are one of the most useless rules (but only at first glance), since they do not in any way affect the execution of the program. These include the style of naming variables (camelCase, with underscores), constants (uppercase), methods, the style of writing curly braces, and whether they are needed if there is only one line of code in the block. This list can be continued. When a programmer writes code, he reads it easily, because he knows his own style. But if he is given a code where Hungarian notation and brackets from a new line are used, he will have to spend additional attention on the perception of a new style. Especially funny is the situation when several completely different styles are used in the same project or even module.
code support rules - are rules that should signal that the code is too complex and it will be difficult to maintain. For example, the complexity index (more about it below) of a method or class is too large or there are too many lines of code in the method, the presence of duplicates in the code, or "magic numbers". I think the essence is clear, they all point us to bottlenecks that will be difficult to maintain. But we must not forget that it is we who can decide which complexity index is large for us, and which is acceptable.
code cleaning and optimization - are the simplest rules in the sense that rarely someone will claim that expressions are very necessary, even when they are not used anywhere. This includes unnecessary imports, variables, and methods that are no longer used, but for some reason, they were left as a legacy.
The metric here is obvious: compliance with the rules should strive for 100%, that is, the fewer violations of the rules the better.
The cyclomatic complexity of the code
A characteristic that directly affects the complexity of code support. Here it is more difficult to distinguish a metric than in the previous characteristic. To put it simply, it depends on the number of nested branching operators and loops. If you are interested in more detailed descriptions, you can read them on the wiki. The lower the index, the better, and the easier it will be to change the structure of the code in the future. It is worth measuring the complexity of a method, class, or file. The value of this metric should be limited to a certain limit number. For example, the cyclomatic complexity of the method should not exceed 10, otherwise, it needs to be simplified or broken down.
An important characteristic that shows how easy it will be to make changes to the code in the future (or present). The metric can be expressed as a percentage as the ratio of duplicate lines to all lines of code. The fewer duplicates, the easier it will be to live with this code.
One of the most holistic and sick topics among programmers: "To comment or not to comment? Everyone is familiar with the dialogue from Steve McConnell's book, it has already been published on Habr. From this we can conclude that the characteristics need to be approached very individually, based on the specifics of the company and the products with which the company works: for small projects, commenting is not so necessary, for large ones, well-developed rules will greatly facilitate maintenance. Two important metrics can be identified for commenting:
the ratio of comments to the entire code - from this metric, we can conclude how detailed comments are and how useful they can be. Of course, it is impossible to tell from this metric whether there are “cycle" comments before the cycle, but this needs to be corrected when the review is conducted.
commenting on public methods - it is the ratio of commented public methods to their total number. Since public methods are used outside of the class or package, it is better to comment on what this method should do and what it can affect. The number of public methods without a comment should tend to zero.
As I have already written, it is better to solve the issue of commenting on the code based on the needs of the company, but it is still better to live with the commented code.
It is not necessary to describe the necessity and role of automated tests for the project, because this is the topic of a separate article. But this is a very important characteristic of the quality of the code. The coverage level is read as the ratio of the number of code elements covered by tests to the number of all existing ones. Depending on what is meant as a code element, the following types of coverage are often distinguished:
file coverage - where you need to determine how to understand that the file is covered by tests: often the file is covered if the test got into the file and executed at least one line of code from the file. Therefore, such a metric is used extremely rarely, but it still has the right to exist.
class coverage - similar to file coverage, only class coverage :). Also rarely used.
method coverage - is the same way of calculating metrics. True, method coverage can be more widespread: if you have a rule on your project to cover each method with at least one test, then using this metric you can quickly find code that does not comply with the rules.
row coverage - is one of the most used metrics for coverage. In the same method of calculus, only a string is taken for the object.
the coverage of branches - is the same, respectively, branching is taken as an element. To achieve a good indicator for this metric is worth the most effort. According to this metric, you can judge how conscientiously the programmer approached the test coverage.
total coverage - it is a coverage metric in which no one element is taken into account in the calculations, but several. The most common use is the total coverage of rows and branches.
The higher the code coverage with tests, the less risk there is of breaking part of the system and leaving it unnoticed.
Instead of a conclusion
The list presented here is not complete, but it may be quite enough to support the code in a high-quality condition. All these characteristics are included in static code analysis and it is good practice to automate this process. I hope the article will be useful for someone.