A moral hazard is when an..
…actor has an incentive to increase its exposure to risk because it does not bear the full costs of that risk.
It’s similar to perverse incentives.
A perverse incentive is an incentive that has an unintended and undesirable result that is contrary to the intentions of its designers.
I have spent the last 6 months working closely with a QA team inside a large financial instutition and have become increasingly skeptical of the value they provide - the development team don’t test as thoroughly as they should because they know that QA are playing backstop.
Enterprise scrum shoulders a portion of the responsiblity here. The relentless focus on “velocity” encourages teams to close stories quickly and punt the testing responsibility downstream. In my current organisation, there is an in-pod QA person that tests each story before it gets closed but again, they know that they’re measured on velocity alone, and that there’s still a multi-month regression cycle that will catch anything they miss.
The QA safety net is a moral hazard.
The executives are focused on trying to reduce the QA cycle time through automation and while that may help, I don’t think it’ll move the needle or change the dev team’s mindset towards quality.
In my opinion, we have to assume that all code has bugs and that QA cannot catch everything. We need to reframe how we think about releases from trying to catch all bugs to reducing the blast radius of the bugs that make it to production.
The funding would be better allocated to building a modern CI/CD pipeline, including:
- Fast, automated rollbacks
- Feature flags
- Backwards-compatibility
- Short release cycles
Automated rollbacks significantly reduce the risk associated with bugs. If you can rollback in minutes then the risk of releasing new code is mitigated by knowing you can rollback quickly.
Feature flags allow buggy features to be disabled quickly, and prevent buggy features from holding up releases. They also facilitate smoke testing in production before features go live.
Backwards-compatibility means that components can be released (and tested) independently. Your database change can go to production the day before the code that uses it. The downstream API can go live before its consumers. Rollbacks are much simpler when you don’t have to co-ordinate across multiple teams and organisational hierarchies.
These three items all converge in shorter release cycles which reduce the size of each release. Customers get code sooner, the risk of each release is lowered because it has fewer changes, and work is still fresh in the minds of the devs that built it in the case when things go wrong.
This last point bears repeating. If your release cycle is measured in months then there’s a reasonable chance that the dev that worked on any given feature will no longer be with the company by the time the feature goes live, leaving you at the mercy of the reverse engineering skills of the new hire. This is especially true in Canadian banks that rely heavily on contract developers.
I’m not advocating that QA should be abolished. I’d like to see QA time-boxed to a short period of exploratory testing. They’re not expected to perform a full regression test but should cover the most common flows and some high-level testing of any new features.