I’ve been working on re-platforming the online banking system for a Canadian retail bank for over two years. Leading up to the initial launch, The Business decided to defer a few features to the next release which was scheduled for about a month later. We created a release
branch to support defect fixes found in testing, and the development team continued adding features to the develop
branch. Bug fixes from release
were merged into develop
regularly. The process was working.
We merged release
into main
when the product was launched in July 2019. The release went smoothly and we fixed any issues on the release
branch, pushed that branch to production, and merged the fixes back to develop
. We were releasing patches regularly and everyone was happy.
Testing began on the new features shortly thereafter but the scope kept growing and many of the new features required significant refactorings. We also continued doing regular production releases of minor enhancements and bug fixes from the release
branch but the process was starting to show cracks. The code on develop
was not production-ready and some features were blocked by 3rd party dependencies. To make matters worse, merging from release
to develop
was becoming more and more time-consuming and was prone to errors because the two branches had diverged significantly.
The Business asked if they could launch some of the new features which had passed testing but we couldn’t justify the time and risk of cherry-picking the code to the release
branch. At this point, The Business decided to draw a line in the sand and didn’t want any new features added to the develop
branch so that we could stabilise it and release it so we created a release-2
branch. Welcome to merge hell. Production fixes had to be merged from release
to release-2
then to develop
.
The process had become unsustainable. We could not afford to have developers spending so much time merging the production bug fixes, and we needed a way to get features out the door when they were ready. Under the current branching strategy, The Business had to decide ahead of time which release a feature should be included in. If that feature got held up for any reason then the entire release had to wait.
After much research and deliberation, I started lobbying to switch to trunk-based development. To my surprise, I encountered opposition from both The Business and the development team. The Business was wary of hiding incomplete work behind feature toggles and the developers didn’t want their codebase to become a rat’s nest of “if” statements to support those feature toggles.
I was able to convince The Business because feature toggles put them in control of when a feature was made available to our customers. For the developers, we agreed to a simple process to ensure the cleanup and removal of a feature toggle shortly after a feature is launched.
Trunk-based development has been a game-changer for our team. Merge conflicts are virtually non-existant, we do production releases every two weeks, and The Business love being able to control when features go live.
It wasn’t all sunshine and roses, though. There’s a sweet spot when it comes to the size of the feature toggle. How much code should a single toggle control? We settled on defining their boundaries in the same way that The Business does when they discuss a feature - it’s something they’d launch as an atomic unit. Feature toggles are a form of intentional technical debt so they shouldn’t be too granular nor too coarse. We also strive to find a single entry point for a feature to avoid cluttering the code with “if” statements.
We’ve been on trunk-based development for about 18 months and honestly I could not imagine going back to Git Flow.