Scaling engineering teams in a healthy way

Since I became an advisor for startups I have been talking to people that either are building or need to build technology teams. One of the questions that people constantly ask me is: how to scale the team?

Creating the first team seems easy. Split this team in two teams — or squads — still ok. However, after that, it starts to get harder. How many squads shall we build? Do we create “tribes”? Do we need managers? What is the role of the individual contributor? How to define the scope of each squad? How fast shall we grow?

During my career I’ve been focusing in solving those questions in the companies I have worked for. Lately — inspired by Ray Dalio’s ideas that everything is a machine of machines — I started to “codify’ this machine in order to create teams that are more efficient in generating value for the company and its users.

The idea in sharing this series of articles is that they work as an SDK for the creation of product and engineering teams. I hope you can incorporate at least part of this “code” in your company in order to build teams more and more efficient over time. Like any other SDK, part of this code and its inspiration was incorporated from other libraries and, I will, when possible, link to the source. So, I can give credit to whom credit is due and allow you to get deeper into a specific topic as you find necessary.

To make reading easier I will try to address just a few questions per article and the first that I want to talk about is: when shall we split our team?

Before I begin answering the question above, it is important to explain: why to split. It looks obvious, but it’s always important to remember that the fundamental reason for companies to divide the technology team is to be able to do more work in parallel. The risk we run here is to open more fronts than we can handle at each moment and loose our capacity to have focus and depth ending in lowering our throughput instead of making it higher.

To avoid falling in this trap and to be able to divide in an efficient way it is important to understand a concept I call the maximum throughput of a technology team.

And, to understand the throughput of this “machine” — the technology team — we have to understand its fundamental unit: the squad.

Most people from the tech world are already familiar with the concept of multidisciplinary teams — or squads. I’m not going to go deep into this concept here. Good sources on that are: inspired — one the of the most complete books about the “basic” on how to build product engineering teams — and an elegant puzzle — which has a lot in common with the way I’ve being building teams along my career.

I recommend reading both. For the sake of this article now we just need to define what is an ideal’s squad formation.

The ideal squad is “utopian” but we shall model it to be able to understand and enhance the product engineering “machine”. The maximum throughput of a squad depends on several variables, the main ones being: number of devs in the squad, squad maturity and the squad health.

The diagram above — taken from an elegant puzzle — shows the 4 possible stages of a squad. The ideal situation is the squad staying in stage 4(innovating) — meaning the squad working on maximum throughput. Hence, if we want our product engineering team running on maximum throughput all squads should be in stage 4. Although this is the ideal situation, it is utopian and what we should aim for is trying to take the most squads possible to this stage.

When we look at a squad it is composed of people with specific roles: software engineers — or devs, product managers and engineer managers.

In my model we have a product manager responsible per squad and the number of devs varies from 4 to 6. We will talk about the engineering manager in another article. Looking at the number of devs, if we have less than 4, I would not consider that a full formed squad — and, certainly we would be in stage 1(falling behind). Therefore, we can augment the squad’s throughput by putting more people. However, this is also something we should discuss further in another article.

Another important variable is the squad’s maturity, which is basically composed by two factors: the seniority of the team members and if they already have the dynamics of an efficient team. More about that in the next article about squad formation — I know, I’m being repetitive 😃

The last variable we should look at is the squad’s health. Anyone working for some time with software knows it is not actually cartesian 😉. However, we can build what I call a “squad cockpit” which are basic health metrics. Those metrics are good health indicators for a team and help us know the state of the squad and guide us in the search for good health. Also not going too much in details about that now but a good source for performance metrics is the book accelerate. In my last teams I’ve used a mixture of metrics from this book with some other that help understanding value creation for the user, amount of technical debt, toil, among others. Unfortunately, we will have to let that for another day as well so we can focus on when and how to split the team.

In most engineering articles we end up finding formulas. This one is no different 😉

squad_throughput = number_of_devs*squad_maturity*squad_health

The formula above describes an indicator to represent the squad_throughput. I believe that it is clear by now that our goal as an engineer leader is to try to maximize the throughput for all squads in our product engineering team. Of course, that maximum throughput is utopian. However, in my experience, the constant search for it enables constant growth of the team’s throughput and — as illustrated in accelerate — this fact has a direct impact on your company’s results.

As mentioned before, the ideal number of devs in a squad should vary between four and six. Less than four, it should not be a full fledge squad, because in general that is not enough people to focus on the problem in a consistent way. With more than six we start to have communication problems and, in general, putting more people just worsens the performance.

To keep the demonstration on how to split the team simple I’ll skip the discussion about the team’s maturity — that we will have the opportunity to discuss soon in the next article from this series — and also the health metric. For now, let’s assume that both are balanced and have a positive contribution to the team’s throughput.

To illustrate the method we will begin with a scenario where a startup is just creating its team.

The first variable from our throughput formula that we should look is the number of devs. Certainly, until we hire at least six devs we should not split the squad.

The question is: what shall we do when we hire the seventh person?

Ideally we should not split and have squads with less than four devs. So, the approach that I recommend is to “fill” a squad until we reach eight, and then, after we have a full squad, duplicate. Although we surpassed the “magic number” of six, the potential lack of efficiency in communication — in my experience — is less than the “overhead” of creating a new squad with less capacity that would probably fall in stage 1 — falling behind.

Looking only to the variable number of devs the approach to the creation of the next squads should be the same. Thus, when we reach twelve devs we would create the third squad, at sixteen we create the fourth, at twenty the fifth and we can go on like that while we grow the team. It’s important to notice that when we split a squad with eight devs in two squads of four devs each, those new squads certainly are not at the maximum throughput. Therefore, we can increase their throughput adding up to 2 devs per squad.

Wow, it cannot be that easy, right? Right! It’s also very important to take a look at the other two throughput variables. And, that’s something I rarely see technology leaders doing in a structured way.

The idea in this article is not to dwell too much on those two variables. However, we need to discuss why they are important and what they are made of.

Let’s start looking at the squad’s maturity. Its first part has to do with the seniority of its members.

A common mistake is ignoring the multiple level of experiences in the market, I will not get too deep on that now, yes: one more article to write — while I don’t I recommend a very interesting article from Martin Fowler. The important here is to realize that we cannot create teams without taking into consideration the engineer’s experience. It’s very important to balance the experience levels of engineers in a team . In my experience a good proportion would be 25% more senior devs, 50% in the intermediate levels and 25% beginners. If we grow the team like that is easy to know when the squad is close to be balanced. I’ve seen so many times companies falling into the trap of assembling unbalanced squads regarding seniority, what tends to generate all sorts of problems in the mid to long term.

To simplify the use of the maturity variable I like to approach it in a binary way — a squad is either balanced or unbalanced. When we have unbalanced squads we know that we are not at the maximum throughput and the important thing here is to make sure that when we split again we do not make the squads way more unbalanced.

The last variable in our balance formula is one that I will make sure to dedicate one — or more 😃 — articles — the squad’s health. To make sure a squad is healthy we should look at metrics that indicate that this squad is generating value to the business. In diagram 1 this would be the equivalent of being in the stage 4 — innovating. The report Accelerate State of DevOps illustrates well what metrics you have to excel in order to have a healthy product team. It’s important to notice that the metrics in Accelerate focus more in the delivery part of the process — which is the basics for you to have a time that performs really well. Reading the Accelerate State of DevOps you can see that there is a direct correlation between those metrics and the performance of the business.

In our throughput formula the definition of healthy is derived from the Accelerate State of DevOps. The health recommended metrics to follow up are lead time, change fail rate, deployment frequency, availability and time to restore from failure. The healthy state from each of those metrics you should define. My recommendation is that you aim to be in the elite status as per the Accelerate State of DevOps and anything below the high status is a deep issue you have to handle. The important thing here is that you have to be very transparent in the definition with the team and if you are not in a healthy state you have to work in order to get back on track and to minimize the impact on squad’s throughput- what impacts directly the company’s result in the mid and long term.

Now that you know the metrics that comprise the maturity and health of the squad it is important to use those while doing splits in squads. A important thing to notice is that if that metrics are unbalanced this can count negatively in the throughput formula. Hence, it is important to consider — when doing the “split exercise” and/or increase in squads — the amount of squads that are unbalanced both in health and maturity terms before doing any change.

Now you might ask: Can I just increase the number of squads when every squad is totally balanced in terms of maturity and health?

The answer is: no. This would be a utopian world and the reality the market imposes does not allow for that.

The way I recommend to look at this is to take a holistic view of the engineering team and monitor how far we are from maximum throughput — what at this time we already know its the sum of each individual squad throughput.

It is important to understand that this exercise needs to be done constantly and the farther you are from maximum throughput, the less efficient your team is and the more it takes to take it to maximum throughput. So, when your are growing your team you should not confuse this exercise only with a: “numbers game”, it is not all about volume. A smaller team, but balanced, delivers way more and has a way smaller overheard in leadership, than a big but unbalanced team.

I hope that this method helps you growing your team in a balanced and healthy way. And, one more thing: do not worry, I already started writing the other articles promised along the text which I believe can help you go through the fascinating journey of building companies and product engineering teams.

Note: This article was originally posted in Portuguese. Starting from this one the english versions of my articles can be found here at medium and the portuguese version in https://bernardocarneiro.com.br

I love building teams and technology. If you want to read my articles in portuguese please go to:https://bernardocarneiro.com.br/