Refactoring: An Introduction

2023-09-01by Antonia Runge

Refactoring (noun):

a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior. (1)

Refactoring is a disciplined way of cleaning up code by applying certain rules that aim to minimize the risk of breaking the functionality.

Content of this article

When code is developed and new features are added over time, it almost inevitably becomes messy at some point. Messy code is hard to maintain and will contain more and more bugs. The solution to this problem is refactoring - cleaning up the code without changing the functionality. In this article, we explain what refactoring means, in which situations it is typically performed and in which cases it may be not the best choice.

What makes code messy?

Code won't be right the first time around. Even senior developers who write almost perfect code for the current situation are confronted with changing demands of clients or environments and the need to add or change functionality.

Code is usually developed iteratively: When new functionality is added over time, modules and components can become heavily dependent on each other, making modifications to individual parts challenging.

Time pressure can lead to a reduced focus on writing clean and structured code.

Limited experience in best practices or a lack of awareness to write clean and maintainable code can make the code base chaotic.

Insufficient planning can result in code that lacks a separation of concerns. Dependencies between components grow which makes the code harder to understand.

Ineffective communication of coding standards and practices between collaborators can lead to different coding styles and approaches across the project. This can result in inconsistencies and varying levels of code structure.

Postponing refactoring or maintenance tasks can lead to a code base that is hard and expensive to maintain.

Why should we refactor?

The need to refactor does not mean that mistakes had been made in the past, it is more a natural process that code becomes less structured over time. Regular refactoring of code can:

increase the code quality by removing duplicated code, long functions or methods, complex conditionals,
make code more readable, modular, easier to understand, cheaper to modify,
make it easier to find bugs and reduce the risk of introducing bugs during code modifications, and thus leads to more robust code,
increase the maintainability with cleaner code that is easier and faster to modify and to extend, thus programming becomes even faster.

Without any refactoring there will be a point in time when developers have to spent all their available resources for bug fixing instead of adding new functionality. Additionally, it will be very time-consuming to change functionality. First, one has to understand the messy code base. Second, one has to find the part(s) of code that must be touched. And third, the risk of introducing bugs or breaking the functionality while changing the code will be very high.

When should we refactor?

As guidance one can take (2):

The Rule of Three

The first time you do something, you just do it. The second time you do something similar, you wince at the duplication, but you do the duplicate thing anyway. The third time you do something similar, you refactor.

For those who like baseball: Three strikes, then you refactor.

The following sections describe situations where it is advised to include refactoring into the process of continuous development.

1. Preparatory Refactoring

One approach is to refactor code just before adding a new feature to make it easier to add the feature. When existing code is not structured such that it would be easy to add the new feature, then: "Make the change easy, then make the easy change" (Kent Beck)

It's like I want to go 100 miles east but instead of just traipsing through the woods, I’m going to drive 20 miles north to the highway and then I’m going to go 100 miles east at three times the speed I could have if I just went straight there. When people are pushing you to just go straight there, sometimes you need to say, “Wait, I need to check the map and find the quickest route.” The preparatory refactoring does that for me.

(Jessica Kerr)

2. Comprehension Refactoring

Before code can be modified that was written by oneself or by someone else, one must understand what it does. One sign for considering to refactor the code is whenever one needs to think to understand what the code is doing, because of e.g. an awkwardly structured logic, or badly named functions, ... .
After having understood the logic behind a part of code, some understanding is in one's head. By refactoring one moves the understanding back into the code itself.

Litter-Pickup Refactoring is a variation of comprensive refactoring: One understands what the code is doing but it's doing it badly, because of e.g. nested logic, or almost identical functions where a new parameter could remove the duplicated code.

The tradeoff here is that restructuring the code might cost time, but leaving trash lying around might cause potential obstacles for future changes. If it's easy, fix it. If it's more effort, make a note and fix it later. Regularly maintaining the code base prevents the accumulation of technical debt.

An old camping adage says:

"Always leave the campground cleaner than you found it."

3. Planned vs. Opportunistic Refactoring

If the time to spend on refactoring wasn't planned, like for preparatory, comprehension and litter-pickup refactoring, one can see it as opportunistic. It is part of the programming flow and simply happens when e.g. adding a new feature or fixing a bug.

When writing code there are always tradeoffs:

how much to parameterize?
where to split functions?

Excellently written code with right tradeoffs for yesterday's features may have wrong tradeoffs for today's features. That is why not only ugly code needs refactoring.

However, when writing clean and maintainable code, planned refactoring should be rare. Most refactoring should be unremarkable, the opportunistic kind. For example, often the fastest way of adding a new feature is to alter existing code and make it easy to add the new feature. Here, the number of lines of changed code easily becomes more than the number of lines of actually newly developed code.

In contrast, writing mainly new code and considering software development as a process of accretion often needs large efforts when trying to maintain a project that grew to a complex object with many layers.

Similarly, postponing or neglecting refactoring can lead to a nested and hardly controllable code base. Then dedicated planned time is needed to put the code base into a better state before new features can be added.

4. Long-Term Refactoring

Most refactorings take minutes or hours at most.

Some large ones can take weeks to complete, such as pulling some part of code into a module and share it with other teams, or removing complex dependencies. In this case, it helps to gradually work on the changes over the course of time and to leave the code always in a still-working state.

5. Refactoring during Code Reviews

Code reviews help to

spread knowledge,
are important for writing clear code, and
give the chance to suggest useful ideas.

Refactoring can help to understand someone else's code.

When having ideas during a code review that are easy to implement, it is worth to refactor. One can test the code with suggestions in place and possibly come up with even more ideas. This offers more concrete results from a code review. Here, a secure way might be to open a new branch, refactor the code and discuss the changes with the author, who wrote the code in review, before merging any changes into the author's branch. In order to get more context on the logic of an implementation, the reviewer can include the author into the process of refactoring.

The next logical step of sitting one-on-one with the original author is called pair programming.

Additionally, we can apply refactoring during performance optimization and within team collaboration:

Refactoring can improve the performance by e.g. removing dependencies, optimizing the structure of data or algorithms, decreasing the resource usage.
Consisten coding standards can be ensured and knowledge be shared.

Why tests?

Refactoring means to optimize the design of a codebase without changing its observable behavior. Nothing should break. But, mistakes happen. If the error is catched quickly, a lot of time on bug hunting can be saved. Tests are a key factor to identify any regression or unintended side effects after a change.

If we have to deal with a poor test coverage, it is advisable to, first, create tests that contain the expected results for the part of code to be changed, before making any modifications.

Best-practice refactoring is done in small steps followed by tests after each one. By this, faults are identified quickly and only small changes must be checked. If the mistake cannot be found, one only has to revert a small part of the code.

In practical terms, this means it should be possible to run tests automatically, in order to not be discouraged from having to run manual tests frequently. Without self-testing code there are legitimate concerns about bugs and uninteded side effects being introduced.

What are the risks?

Whenever one makes changes to existing code, there are some risks one should keep an eye on.

Break the functionality: Refactoring means that code will be changed. While this can help to find bugs, there is also the risk to introduce bugs, to break the functionality or to alter the behavior of a system without noticing.

Partial or incomplete refactoring: Inconsistent coding standards, code that is even harder to understand and to maintain may be the result of interrupting refactoring.

Time-consuming: When refactoring complex systems, careful planning, analysis and implementation will consume time.

Unexpected effects from dependencies: Components, classes or modules often depend on each other. Inadvertently missing these dependencies can introduce cascading effects that are difficult to manage.

Inadvertent changes: Developers lacking experiences in programming principles and knowledge in refactoring techniques may make inappropriate changes and add new problems.

Such risks can be significantly reduced by following best practices, such as

working incrementally
pursuing a solid testing strategy
ensuring clear documentation
tracking the changes
involving experienced developers

When should we not refactor?

Nothing to modify: If there is nothing to modify and the application is running as expected, then we can ignore possibly messy code. Also, there would be no benefit of better understanding the application. There should be a certain need to refactor an application, otherwise the effort and potential risks might outweigh the benefits.

Tight deadlines: Time pressure might be a reason to postpone refactoring. Refactoring of larger parts of a codebase can become very time-consuming.

Lacking testing infrastructure: If tests are rare, they need to be written first. Without tests one can rarely validate the correctness of the refactoring. The risk to break the code becomes very high.

Highly critical infrastructure: If the risks outweigh the benefits refactoring should be considered carefully. Major changes to critical code may be at the cost of the system stability.

Easier to rewrite: Sometimes it is easier to rewrite the whole application instead of refactoring it. To decide when one should refactor and when it is better to rewrite needs experience and good judgement. Sometimes it needs some first attempts of refactoring or of rewriting the application to understand the extent. However, rewriting everything is a significant amount of work. Within a live environment, one has to maintain two systems during the transition period. Time will not stand still, and new features need to be incorporated into both, the old and the new system. Based on our experience, it is often better to refactor instead of rewriting everything, even though the initial impulse may be to start from scratch.

How to convince stakeholders?

Sometimes customers or managers believe that refactoring is fixing errors made in the past, or work without any value.

Technology-conscious managers that are aware of the design stamina hypothesis will enhance regular refactoring and take care for signs of too little refactoring with no need of convincing. In practice, too much is much rarer than too little refactoring.

Software development is a profession and developers are expected to create effective and robust software rapidly. Typically, stakeholders are not consulted when it comes to details such as breaking down code into functions or files during development. The decision for or against refactoring in a specific case is a decision to be made self-determined by the developers, too. One principle of the Agile Manifesto states:

"The best architectures, requirements, and designs emerge from self-organizing teams."

In essence, if technical awareness is not present on the stakeholder side, the advice is to not discuss refactoring with them (2). Ensuring quality and being able to add new capabilities in a fast way are reasonable goals and just some of the advantages of refactoring. Only code that is easy to read and to understand can be expanded efficiently. Refactoring must be part of continuous development.

Literature

(1) https://martinfowler.com/bliki/DefinitionOfRefactoring.html

(2) Refactoring: Improving the Design of Existing Code. M. Fowler. Addison-Wesley, Boston, MA, USA, (2019).