[Book Notes] Team Topologies
“Book Notes” are a way to synthesize my thoughts on a book before setting it aside. I like to read, but frankly I forget the details of a particular book over time. Hopefully this helps!
This “Book Notes” is about Team Topologies, by Matthew Skelton and Manuel Pais.
Key Takeaway: Conway’s Law is real, and sound organizational and communication patterns can help wield it to your advantage.
Conway’s Law
Conway’s Law comes from a 1968 paper by Mel Conway titled “How Do Committees Invent?” Conway summarizes his law as:
Any organization that designs a system (defined more broadly here than just information systems) will inevitably produce a design whose structure is a copy of the organization’s communication structure.
Conway’s Law applies not only to organizational structures, but to the communication between these structures:
Many organizations assume that more communication is always better, but this is not really the case. What we need is focused communication between specific teams… Conway’s law suggests that this kind of many-to-many communication will tend to produce monolithic, tangled, highly coupled, interdependent systems that do not support fast flow. More communication is not necessarily a good thing.
Future innovation is further stifled because Conway’s Law prevents us from seeing possibilities. From Conway’s original paper:
the very act of organizing a design team means that certain design decisions have already been made, explicitly or otherwise. Given any design team organization, there is a class of design alternatives which cannot be effectively pursued by such an organization because the necessary communication paths do not exist. Therefore, there is no such thing as a design group which is both organized and unbiased.
Team Topologies spends a lot of time delving into Conway’s Law because it is inevitable. Recognizing that Conway’s law is inevitable shifts our thinking from “how do we avoid it?” to “how do we use it to our advantage?”
One way to use Conway’s Law to our advantage is the “inverse Conway maneuver”.
whereby an organization focuses on organizing team structures to match the architecture they want the system to exhibit rather than expecting teams to follow a mandated architecture design
As Michael Nygard says: “Team assignments are the first draft of the architecture.”
Team Topologies Overview
The fundamental unit in Team Topologies is the team.
The team should be the fundamental means of delivery rather than the individual. If we follow this team-first approach, we need to ensure that the people within our teams also have (or develop) a team-first mindset.
Since the team is the fundamental unit of delivery, it’s important that there be a high level of trust across the team. This approach reminded me of Team of Teams (which is also referenced in Team Topologies). Keeping team size in mind (Amazon’s two pizza rule, Dunbar’s number) strengthens trust by establishing predictable behavior and interactions between teams and across teams.
Building a diverse team is also important:
Recent research in both civilian and military contexts strongly suggests that teams with members of diverse backgrounds tend to produce more creative solutions more rapidly and tend to be better at empathizing with other teams’ needs… This diverse mix of people also appears to foster better results, as team members make fewer assumptions about the context and needs of their software users.
Finally, just as software uses APIs to communicate, teams need APIs to interact with other teams, and shield other teams from internal complexity.
Cognitive Load
Team Topologies spends a lot of time on the concept of “cognitive load”. This resonated with me because I’ve experienced the challenge of building new infrastructure while maintaining existing infrastructure.
Managing cognitive load through teams with clear responsibilities and boundaries is a distinguishing focus of team design in the Team Topologies approach… When cognitive load isn’t considered, teams are spread thin trying to cover an excessive amount of responsibilities and domains. Such a team lacks bandwidth to pursue mastery of their trade and struggles with the costs of switching contexts.
Cognitive load was characterized in 1988 by psychologist John Sweller as “the total amount of mental effort being used in the working memory.”
Sweller defines three different kinds of cognitive load:
Intrinsic cognitive load — relates to aspects of the task fundamental to the problem space (e.g., “What is the structure of a Java class?” “How do I create a new method?”)
Extraneous cognitive load — relates to the environment in which the task is being done (e.g., “How do I deploy this component again?” “How do I configure this service?”)
Germane cognitive load — relates to aspects of the task that need special attention for learning or high performance (e.g., “How should this service interact with the ABC service?”)
Broadly speaking, for effective delivery and operations of modern software systems, organizations should attempt to minimize intrinsic cognitive load (through training, good choice of technologies, hiring, pair programming, etc.) and eliminate extraneous cognitive load altogether (boring or superfluous tasks or commands that add little value to retain in the working memory and can often be automated away), leaving more space for germane cognitive load (which is where the “value add” thinking lies).
Four Fundamental Team Topologies
With the above introduction, the book establishes four different team designs. These designs are built by keeping the idea of “flow” in mind:
At a conceptual level, software architectures should resemble the flows of change they enable; instead of a series of interconnected components, we should be designing flows on top of an underlying platform
The four topologies are:
- Stream-Aligned Team — A stream-aligned team is a team aligned to a single, valuable stream of work; this might be a single product or service, a single set of features, a single user journey, or a single user persona. The stream-aligned team is the primary team type in an organization, and the purpose of the other fundamental team topologies is to reduce the burden on the stream-aligned teams.
- Enabling Team — An enabling team is composed of specialists in a given technical (or product) domain, and they help bridge this capability gap. Such teams cross-cut to the stream-aligned teams and have the required bandwidth to research, try out options, and make informed suggestions on adequate tooling, practices, frameworks, and any of the ecosystem choices around the application stack. The end goal of an enabling team is to increase the autonomy of stream-aligned teams by growing their capabilities with a focus on their problems first, not the solutions per se.
- Complicated-subsystem team — A complicated-subsystem team is responsible for building and maintaining a part of the system that depends heavily on specialist knowledge, to the extent that most team members must be specialists in that area of knowledge in order to understand and make changes to the subsystem. The goal of this team is to reduce the cognitive load of stream-aligned teams working on systems that include or use the complicated subsystem.
- Platform Team — The purpose of a platform team is to enable stream-aligned teams to deliver work with substantial autonomy. The stream-aligned team maintains full ownership of building, running, and fixing their application in production. The platform team provides internal services to reduce the cognitive load that would be required from stream-aligned teams to develop these underlying services.
Three Fundamental Interaction Modes
In addition to designing teams, we need to design how teams interact. Team Topologies identifies three fundamental interaction modes:
- Collaboration — The collaboration team mode is suitable where a high degree of adaptability or discovery is needed, particularly when exploring new technologies or techniques. The collaboration interaction mode is good for rapid discovery of new things, because it avoids costly hand-offs between teams.
- X-as-a-Service — The X-as-a-Service team interaction mode is suited to situations where there is a need for one or more teams to use a code library, component, API, or platform that “just works” without much effort, where a component or aspect of the system can be effectively provided “as a service” by a distinct team or group of teams.
- Facilitating — The facilitating team interaction mode is suited to situations where one or more teams would benefit from the active help of another team facilitating (or coaching) some aspect of their work. The facilitating interaction mode is the main operating mode of an enabling team.
Putting the Fundamentals into Practice
Team Topologies then covers different strategies for evolving existing team structures towards the topologies. “It is usually best to try to align software boundaries with the different business domain areas.” There are many factors that go into figuring out the right “fracture plane” for a team (“A fracture plane is a natural seam in the software system that allows the system to be split easily into two or more parts.”).
One approach is viewing domains through the lens of the Cynefin framework (simple, complex, complicated). and using the following heuristics.
- Identify distinct domains and assign each domain to a single team
- A single team owns 2–3 “simple” domains
- A team owning a “complex” domain should not have any more domains assigned to them
- Avoid having a single team responsible for two complicated domains.
Other heuristics to consider include: How quickly a part of a system needs to change, whether the area needs to be highly regulated, geographical team location, and, different risk or performance profiles.
Regardless of how you decide to create the team, the litmus test is:
Does the resulting architecture support more autonomous teams (less dependent teams) with reduced cognitive load (less disparate responsibilities)?
Evolving over time
Keep in mind that this is an evolving process:
Identifying bounded contexts requires a fair amount of business knowledge and technical expertise, so it’s normal to make mistakes initially. But that should not deter you from improving and adapting as you understand your context better, even if that involves some kind of recurring “cost” of service redesign.
As part of this evolution, “there are some situations that act as triggers to redesign team topologies within the organization. Learning to recognize these will help an organization continue to adapt and evolve with its needs”. These triggers include: software has grown too large for one team, delivery cadence is becoming slower, and multiple business services rely on a large set of underlying services
Feedback loops are critical for understanding if and when to change an organizational design.
Jeff Sussna, author of Designing Delivery, puts it like this: “Businesses normally treat operations as an output of design. . . . In order to empathize, though, one must be able to hear. In order to hear, one needs input from operations. Operations thus becomes an input to design.”
Increasingly, software is less of a “product for” and more of an “ongoing conversation with” users… By developing greater empathy for users, we can improve the kinds of interactions we have with them, enhancing their experiences and better meeting their needs.
Conclusion
Team Topologies ends with these five steps for implementing the ideas from the book:
- Start with the Team
- Identify Suitable Streams of Change
- Identify a Thinnest Viable Platform (TVP)
- Identify Capability Gaps in Team Coaching, Mentoring, Service Management, and Documentation
- Share and Practice Different Interaction Modes and Explain Principles behind New Ways of Working
Platform Team
Finally the “Platform Team” section of the book delves into more details about what a Platform team is. This resonated with me since my current role is with a platform team. I’ve included some tidbits here so I can refer to them later on.
This definition of “platform” is aligned with Evan Bottcher’s definition of a digital platform: A digital platform is a foundation of self-service APIs, tools, services, knowledge and support which are arranged as a compelling internal product. Autonomous delivery teams can make use of the platform to deliver product features at a higher pace, with reduced coordination.
“Ease of use” is fundamental for platform adoption and reflects the fact that platform teams must treat the services they offer as products that are reliable, usable, and fit for purpose, regardless of if they are consumed by internal or external customers.
Common platforms we find abstract away infrastructure, networking, and other cross-cutting capabilities at a lower level of the stack.
A good platform provides standards, templates, APIs, and well-proven best practices for Dev teams to use to innovate rapidly and effectively.
we should aim for a thinnest viable platform (TVP) and avoid letting the platform dominate the discourse.
good platform helps Dev teams focus on the germane (differentiating) aspects of a problem, increasing personal and team-level flow, and enabling the whole team to be more effective.
To avoid the too-common trap of building a platform disconnected from the needs of teams, it is essential to ensure that the platform teams have a focus on user experience (UX) and particularly developer experience (DevEx).
Crucially, the evolution of the platform “product” is not simply driven by feature requests from Dev teams; instead, it is curated and carefully shaped to meet their needs in the longer term.