Bug management that works (Part 1)
Substack sent this email to their subscribers on October 1, 2024.
👋 Hi, this is Gergely with a subscriber-only issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. If you’ve been forwarded this email, you can . Bug management that works (Part 1)Finding and triaging bugs, fixing bugs on the spot instead of ‘managing’ them, and how to make time for bug fixingBefore we start: The Pragmatic Engineer Podcast started off last week, with episode #1: AI tools for software engineers, but without the hype. New episodes come every other Wednesday. Thank you to everyone who listened to this first one. If you enjoy podcasts, please do add it on Apple, Spotify, YouTube or your in your favorite player – and you will have episodes automatically show up, while also helping the show. How do you deal with bugs in software products you build? This topic seems very under-discussed, but it affects pretty much every software team. To find out what professionals think, with we reached out to two dozen engineering leaders and software engineers, who kindly shared approaches which work for their team and organization. This topic is evergreen, and one that has outsized (potentially decisive!) impact on product quality and user experience. In this issue, we cover:
As a refresher, we have a few previous deepdives that are related to this topic: Thank you to everyone who contributed insights to this article: Ahmed Saher (engineering manager), Anaïs van Asselt (senior QA engineer), Andrea Sipos (product leader), Bernd Kampl (Software Engineering Team Lead), Jason Diller (VP of Engineering), John Cutler (product leader), Magnus L. Udbjørg (CTO), Michał Borek (Principal Engineer), Rebecca Frost (QA leader), Rebecca Holm Ring (engineering leader), Ruben Weijers (engineering manager), Ryan Hanni (Director of Engineering), Serdar Biyik (engineering manager), Walter de Bruijn (Head of Engineering Productivity) 1. Finding bugsHow can we be confident that the software we release has no known issues? We need to validate that it works correctly and there are common approaches for this. Dogfood products. The term “dogfooding” is the name of the common practice of devs and employees using a product while they are building it, pre-release. For example, when I worked at Uber, the company issued free credits for staff to use the internal beta app for rides and food deliveries. At Skype, we used running internal beta versions of Skype for all internal chat and video calling. The business gave Skype credits to employees, so we could dogfood paid features like landline calls. Spotify does the same, as Rebecca Holm Ring, a former engineering manager there, shares:
Invest in test automation. Anaïs van Asselt – senior QA at Choco – shares their approach:
At smaller companies, be close to users. These places tend to be closer to users and can use this to build a relationship with users who get invested in the product and the reporting of bugs. Bernd Kampl – Software Engineering Team Lead at Anyline, a smaller cartech AI company – shares:
Magnus Udbjørg is CTO of Testaviva, a 50-person startup in Denmark. His take is that it’s optimal to build trust with users so they report issues:
A fair question is why not do lots of testing, themselves? The smaller the company and the fewer the customers, the more it feels too expensive to invest a lot in testing, early on. Of course, there are always countercases, like how Figma spent nearly 3 years iterating on the first release, in order to get the performance of their collaborative, web-based editor right, to give users a fun “wow moment.” Worth noting that Figma is a product the dev team used continuously while developing it, getting lots of testing during the building phase. We cover Figma’s engineering culture in a deep dive. Consider alpha and beta testing at larger companies. Alpha and beta testing is about giving customers access to unfinished, less stable versions of a product. “Alpha” usually refers to a latest build that has had little to no QA testing. “Beta” versions have had some testing, but not as much as a full release. Rebecca Holm Ring shares how this worked at Spotify:
Automation: testing and code analysis. Unit tests, integration tests, end-to-end-tests, and other automated tests, are great ways to catch regressions, which is a software bug introduced into a feature after the feature was working correctly; the feature has ‘regressed’ into a faulty state. This is true for static code analysis and other tools that automate quality assurance. We cover more on these methods in Shipping to production and QA approaches across the industry. Code reviews. These serve multiple purposes, offering a second pair of eyes to double check code, spread knowledge, and follow not-yet-automated conventions, and more. Catching bugs before they make it into the codebase is an occasional side effect. Even so, bugs can easily slip through code reviews, which are nowhere near a perfect way to defend against shipping bugs and regressions. Define what a bug is. Users often report “bugs” when they mean missing features, so it can be helpful for teams to agree what a bug is and how to categorize them. In general, a bug is a flaw that results in a software product behaving incorrectly. Categorizations can be granular, like splitting bugs into concurrency bugs, syntax ones, arithmetic, logical errors, human errors and so on. The simplest categorization is to split bugs into functional ones, when the behavior of the software is clearly wrong, and non-functional ones, when a bug is revealed in things like a system slowing down, increased latency, and other harder-to-spot issues. It might be helpful to devise your own categorizations, based on the type of bugs you observe, and in a way that’s helpful for your product and organization. 2. Users reporting bugsGathering bugs can be a great source of data, providing a sense of product quality for feedback to teams, the organization, or company. However, data quality depends on how good the bug reporting process is – and how likely people are to report bugs! Great reports and data come from simple, suitable processes. Features of useful bug reports:
Bad reports can create extra work and poor bug reporting processes can cause people to not commit to recording issues in the first place, and a spiral is created of deteriorating product quality, with the engineering team clueless of how bad things are. To avoid an outcome like that, here are some processes tech companies use to support good bug reporting processes. Make it easy to create quality bug reports. Walter de Bruijn, Head of Engineering Productivity at Miro suggests this is critical:
QA leader Rebecca Frost on why quality bug reports count:
Make the reporting process accessible. If creating a bug report is too complicated, it discourages reporting. There are ways to make it accessible:
A example of effective engineering support is at SF-based scaleup Ontra, as shared by director of engineering, Ryan Hanni:
Scaling bug reporting processesThere’s no one process that works best everywhere. Here are some common approaches by company size: Smaller companies and startups: bug reports are usually simple, and the reporting process is lean because time is precious and knowledge is dense. Such workplaces are small enough that most tech folks can keep tabs on what’s happening, and people can submit bug reports pretty easily. There’s rarely a need for formal processes. Here are some efficient, less formal ones:
Mid-sized companies and scaleups: process matters more, and these places are big enough for it to be wasteful for everyone to keep tabs on reported bugs. There are also more bug reports, and it’s a time waster to get repeated information and metadata in bug reports. Bug report templates and processes also matter. Good onboarding and documentation for processes and standards for bugs can have a big impact on efficiency. Large companies: investing in automated processes is worthwhile due to the size and nature of the business:
Here’s a good example of what JIRA ping pong looks like. Engineering manager Rebecca Holm Ring shares how it plays out a larger company:
At larger companies, some things can help deal with an ever-growing pile of bug reports, and improve processes and tooling:
Ryan Hanni, director of engineering at Ontra, shares examples of manual and mostly-automated processes he’s implemented at different stages of an org’s lifecycle: Manual process:
Partially Automated:
Mostly Automated:
3. Bug triageYou now have a process for the bug reports to flow in, so the next step is to figure out which ones are critical, which are duplicates, and which ones not to bother with. Here are some common approaches:... Subscribe to The Pragmatic Engineer to unlock the rest.Become a paying subscriber of The Pragmatic Engineer to get access to this post and other subscriber-only content. A subscription gets you:
|