This article is adopted from Squad Health Check Model in Spotify coined by Henrik Kniberg.
A lot of companies experiment with ways of measuring and visualizing how their teams are doing. They’re usually called “maturity models”, and involve some sort of progression through different levels, but in spotify they are using squad health check model. This article is based on this model.
The intent of these types of models is usually help teams become more self-aware so they can focus their improvement efforts.
How this squad health check model works
When checking the health of a squad, there’s really two stakeholders:
- The squad itself. While discussing the different health indicators, the squad builds up self-awareness about what’s working and what’s not. The broad selection of questions helps expand their perspective. Perhaps they were well aware of the code quality issues, but hadn’t really thought about the customer value perspective, or how fast they learn. It also provides a balanced perspective, showing the good stuff as well as the pain points.
- People supporting the squad. Managers and coaches that work outside (or partly outside) the squad get a high level summary of what’s working and what’s not. They can also see patterns across multiple squads. If you have dozens of teams and can’t talk to everyone about everything, a visual summary like this helps you figure out how to spend your time, and who to talk to about what.
The first step in solving a problem is to be aware of it. And this type of visualization makes it harder for everyone to ignore the problem.
What we should do
- Run workshops where members of a squad discuss and assess their current situation based on a number of different perspectives (quality, fun, value, etc).
- Create a graphical summary of the result
- Use the data to help the squads improve
Here’s a real-life example of health check output for one tribe:
It shows how 7 different squads in a tribe see their own situation. Color is current state (green = good, yellow = some problems, red = really bad). Arrow is the trend (is this generally improving or getting worse?).
Stare at it for a minute, and you start seeing some interesting things:
- Scan each column, and you see some major differences between squads. Squad 4 is happy with just about everything. Squad 2 is having lots of trouble, but most things are improving.
- Scan each row, and you see systemic patterns. Every squad is having fun (and it’s even improving)! Motivation is apparently not a problem at all. But the release process is problematic, and the codebase is overall in bad shape. Over time, that will probably reduce the Fun as well.
- Scan the overall picture, and you see that just about every arrow is up, only two down arrow in the whole picture. That means the improvement process (the most important process of all) seems to be working.
This is, of course, just an approximation of reality (“all models are wrong, but some are useful” – George Box). So it’s worth double checking things before taking action.
Is Squad 4 really in such great shape, or are they just optimistic and not seeing their own problems? Most squads think they are delivering good value to their customers – but how do they know? Is that based on wishful thinking or real customer feedback?
In this particular case, squad 4 was actually formed just a week before the health check and they were definitely in the forming phase, or “on honeymoon”. So both squad 2 and squad 4 needed a lot of support.
“Easy to release” was clearly a major issue, so this led to a bigger focus on things like continuous delivery, and we’ve seen some good progress there.
Note that this is a self-assessment model, all based on the honesty and subjective opinions of the people in the squads. So it only works in a high-trust environment, where people trust their managers and colleagues to act in their best interest. The data is easy to game, so the key is to make sure there is no incentive to do so.
To gather the data, we can use a physical deck of “Awesome Cards”, each card is one health indicator with an “Example of Awesome” and “Example of Crappy”.
The deck typically has around 10 cards, here is an example of a complete deck:
|Area||Example of Awesome||Example of Crappy|
|Easy to release||Releasing is simple, safe, painless & mostly automated.||Releasing is risky, painful, lots of manual work, and takes forever.|
|Suitable process||Our way of working fits us perfectly||Our way of working sucks|
|Tech quality (code base health)||We’re proud of the quality of our code! It is clean, easy to read, and has great test coverage.||Our code is a pile of dung, and technical debt is raging out of control|
|Value||We deliver great stuff! We’re proud of it and our stakeholders are really happy.||We deliver crap. We feel ashamed to deliver it. Our stakeholders hate us.|
|Speed||We get stuff done really quickly.No waiting, no delays.||We never seem to get done with anything.We keep getting stuck or interrupted. Stories keep getting stuck on dependencies|
|Mission||We know exactly why we are here, and we are really excited about it||We have no idea why we are here, there is no high level picture or focus. Our so-called mission is completely unclear and uninspiring.|
|Fun||We love going to work, and have great fun working together||Boooooooring.|
|Learning||We’re learning lots of interesting stuff all the time!||We never have time to learn anything|
|Support||We always get great support & help when we ask for it!||We keep getting stuck because we can’t get the support & help that we ask for.|
|Pawns or players||We are in control of our destiny! We decide what to build and how to build it.||We are just pawns in a game of chess, with no influence over what we build or how we build it|
For each question, the squad is asked to discuss if they are closer to “awesome” or closer to “crappy”, and we use basic workshop techniques (dot voting, etc) to help them reach consensus about which color to choose for that indicator, and what the trend is (stable, improving, or getting worse).
We like keeping it at three levels (green/yellow/red) to keep it simple. The exact definition of the colors will vary, but something like this:
- Green doesn’t necessarily mean things are perfect. It just means the squad is happy with this, and see no major need for improvement right now.
- Yellow means there are some important problems that need addressing, but it’s not a disaster.
- Red means this really sucks and needs to be improved.
Yes, this is subjective data. In theory the squad may choose to involve hard data (cycle time, defect count, velocity, etc), but few do so. Because, even with hard data, the squad needs to interpret the data and decide if it means we have a problem or not. So at the end of the day, everything is subjective anyway. If something feels like a problem, that in itself is a problem.
Sometimes we combine this with retrospectives, for example vote on one card and decide on actions to improve that area.
Try this squad health check model, and based on this, you can improve the model based on your preference. The most important thing is to ensure continuous improvement is facilitated and happened in the team.