Packaging by layer versus packaging by feature
This blog discusses an issue that’s been around for some time, yet while I’ve been supporting recruitment over the past few years it’s become seemingly obvious that the vast majority of software engineers manage to overlook it… People still tend to over-engineer the packaging strategy of an application they’ve built in only 30 minutes as part of an interview process.
To shed some light on the problem and add a little context, I’ll run through an illustrative scenario of the process of transforming a monolith application into microservices. Some may be aware of the pros and cons of a microservices architecture, while others are aware that in order to minimise the risk or to perform refactoring on small interactions on an enterprise environment, there is a strategy called ‘Monolith First”. In this scenario, the development team designed and built an application intended to be comprised of microservices, however it was initially created as a monolith and the team is about to start the migration process.
A number of developers learned to code at University; some learned with “Hello World” tutorials and others with textbooks. Typically in each case, the principles and techniques are taught within the realms of a simplified context. When building complex systems, some developers tend to repeat the same approach they’d initially been taught without much evolution or change in method.
The illustrative system is based on a software architecture involving three components and is presented like so:
The system was developed with Test Driven Development (TDD) and using Domain Driven Design (DDD) to ensure the code is well designed. There is a single responsibility and other SOLID principles that the transaction boundaries do not exceed the bounded contexts defined to the three components.
Every programming language and paradigm may treat the Software Package concept in a different way. Assuming the components were developed as a monolith-first Java Web App, its classes can be logically categorised into Java packages that give them namespaces (technically folder hierarchies) and can also be physically packaged into Java archives (JAR, WAR, EAR). With the monolith-first approach, all classes are assembled and distributed as one archive but let’s imagine that internally these classes were structured with the following packaging strategy:
After the assessment, the team (or architect) decides that the time is right to refactor the solution and to consequently split the components into three microservices. During this process, the team realised there were some issues with their structure.
It’s possible to identify the classes that are prefixed with the name of the components. It’s also reasonable to assume these are the starting points of the potential public APIs. In reality with this structure, potentially all interfaces must be public. For example, in order to give access to the repository layer from another package, its interface must be public. In fact, every layers interface must be public, which begs the question – where is the public API of each of the three components?
What about internal components? By analysing class name suffixes, it’s possible to identify the usage of the strategy pattern. Which component is dependent on the strategy? It is necessary to navigate through usages of the interface in order to identify this. While structuring the code in this manner, developers get used to qualify all the interfaces and classes as public. Defensive programming is lost and there is a very real possibility that the implementation of the strategies are also public and could be referenced directly. In reality, the strategy interface is probably a design decision that falls under the category of implementation detail. Usually only a service or domain object holding the strategy context should to be aware of it.
In addition, when looking at the package names, it’s fair to claim that the Model-View-Controller (MVC) pattern is used. Assuming that the service is a data-centric REST API and its main client is a frontend Single Page Application (SPA) also implementing the MVC. The usage of this pattern on a backend REST API is usually an overhead and also a reflection of the over-engineering of the structure. Data-centric REST APIs react to requests for resources, there’s no actual view. As a rendition (or adaptation) of the object itself is usually returned. Also, a controller is not necessary as there isn’t a user interface being managed by the backend and the actions are inferred by HTTP verbs and paths. In the end, the model ends up by becoming an anaemic service implementation that simply delegates calls to the repository. Services should have a purpose greater than facade CRUD operations for domain objects already managed by repositories.
Concluding the analysis of the approach, how can we ensure developers respect the services boundaries? Using this packaging strategy, it’s easier to overlook during a code review when a Service in one component becomes dependent on a repository or other internal class of another component. All interfaces have to be public in the first place, so it’s possible to add the dependency without changing the target interface. Should services consume multiple repositories, developers would have to be very careful in order to respect the transactional boundaries across the multiple bounded contexts. The microservices refactoring process will be considerably more expensive if there are atomic data operations among repositories living on separated services.
Is there a better packaging strategy for helping to protect internal resources, to promote SOLID principles and to enforce transactional boundaries, so as at the same time make the process of breaking the monolith smoother? Yes, by packaging by feature and NOT by layer. Layer is indeed an important concept, however for logical division, we can identify layers by using suffixes in class names. When physical division is necessary (rare in microservices architectures), we still structure under same package name (effectively the namespace) but assemble each layer in separated archive files, so that it can be distributed separately.
Going back to our scenario, let’s imagine the same monolith application structured with the following packaging strategy:
Unlike the previous strategy, there’s an obvious separation between the three concerns and it’s possible to assemble the three packages into three separated archives without touching the code (besides dependency management on the building tool settings).
With Java syntax, one important benefit of using packages is that it doesn’t matter on how many archives in the classes are assembled, it’s possible to protect all internal implementation details while each non-public interface can (and should) be non-public. In Java core language libraries, similarly to various open source projects, it’s common to find packages with a large number of classes so Java archives can contain one single package inside.
With this strategy, the number of classes on each package is larger but there is a clear distinction on what the components public interfaces are and the implementation details. Any sub-component consumed by one of the three components needs to be previously isolated on another package and made public in order to be accessible by the other packages, that way, the required reusable library is obvious.
When using package access level as a defensive programming device, the previously described tight-coupled scenarios are limited and the code reviewers are given much better visibility on any fallen trap. For a class to become dependent on a non-public member of another component (package), its access level must intentionally be changed to public. Developers are discouraged to use other components protected interfaces so as any of its implementations, therefore, there is an all-but certainty that the repositories or the strategies are dependencies only to the classes they were supposed to be.
The transactional boundaries issue I’ve previously described is not technically restricted in this approach, but the structure of the code collaborates positively with the developer’s mindset about the bounded context around the three concerns of the scenario. Keeping the package names and the architecture component documentations aligned, the higher abstraction so as the source code can be used in order to leverage design analysis about direction of communication, circular dependencies and other concerns that can lead to discuss SOLID principles, component roles and responsibilities.
The aspects discussed above are the reasons why structuring the code by layer is an unnecessary and overly-engineered mistake and why structuring by feature (mainly by architectural component) – besides being clearer, leverages simplified modularisation and improves defensive programming. When presenting the second packaging strategy, MVC classes are retained despite being a potential source of over-engineering. I’ll discuss the alternatives in a separate article about data-centric microservices and front-end applications.