Team Topologies - Organizing for fast flow of value

View Original

Organizational evolution for accelerating delivery of comparison services at Uswitch

We spoke with Paul Ingles, CTO at RVU which operates the UK’s leading home services price comparison site Uswitch, to understand their approach to teams and practices for accelerating the flow of software delivery and operations across a growing number of internal engineering teams.

Paul Ingles, CTO at RVU / Uswitch

Paul is an experienced technical leader and became CTO at RVU in London in early 2019. He’s been in technology leadership roles for nearly 15 years and is privileged to have worked and learned at some pioneering technology organisations: ThoughtWorks, Forward Internet Group and now at Uswitch/RVU. For the last 10 years Paul has helped the organisation and technology continue to scale: focusing on smaller, faster releases and strong product (rather than project) oriented teams.  

 

How many software engineering teams does RVU / Uswitch have today and how many are part of the platform team(s)? 

We have 12 teams spread across a variety of problems and products. Since I joined in 2010 we’ve taken a position to organise around long-lived streams of work, staffing teams with a mix of people necessary to be successful: product managers, engineers, designers, data scientists, and so on.

It definitely feels like we’re a relatively fluid organisation; adapting structure and organisation to optimise for flow relatively continuously (perhaps sometimes too much). One of our organisational beliefs is “everything is written in pencil” and that applies to how we organise ourselves too! 

How did the platform start? What challenges did the platform try to address and how? What are some of the results so far?  

In 2010 we were able to add huge value by focusing on self-sufficiency: we didn’t know how to make the business profitable but we knew we’d be more likely to find out if we could move faster, and informed through engineering principles of loose-coupling and high cohesion, reasoned independent teams were the best route. One trade-off with that is that teams often need to solve the same problem and it becomes needlessly wasteful to have 4 different approaches to precisely the same problem. We’ve therefore sought to operate teams as platforms when we think there’s an opportunity for solving a problem that a large number of other teams encounter independently.

Figure 1: Some of the autonomous teams (aligned to consumer services) at Uswitch, circa 2015

We’ve had a lot of success in some areas, and other areas have been more challenging. Defining good APIs and boundaries is a difficult mix of art and science, getting it right is critical but it often feels like something you only really know when you see it, it’s a lot harder to always get it right up front. 

Our Cloud Infrastructure team has been able to build upon APIs and abstractions that have already had some open-source rigour applied to them: which generally allows for a lot of flexibility and autonomy. The Cloud Infrastructure team is then able to automate more, providing higher leverage work to the rest of the organisation as a consequence. For example, right now the team is looking at ways to help automate the last mile of software releases with canary deployments. That’s going to be a huge productivity benefit to the organisation as a whole. 

Figure 2: Cloud Infrastructure (platform) team collaborating to develop a new service for canary deploys, while continuing to provide existing services in X-as-a-Service interaction mode.

Other platform teams have gradually been able to adapt and refine their interaction over time to deliver value. In the past, we’ve talked about this as wanting platform teams to provide superlinear impact but with sublinear growth in their work

We’ve learned to be pragmatic and continually iterate, potentially requiring teams to be responsible for more than they ultimately will be to get movement and refine over time. A good example is evolving some of our core affiliate marketing systems to a platform, providing APIs and tools “as a Service” to other stream-aligned teams. Initially, the platform team took on more domain responsibility which helped drive out understanding but ultimately proved to be the wrong boundary though.

In general, platform teams provide a great mechanism for force multiplier work. We’ve made big improvements in back-office operations, marketing, and software operations as a consequence of the work our platform teams do.

Were there specific challenges to the idea of treating the platform as a product? 

We’ve been operating software teams around products rather than projects since I joined in 2010. At the time we didn’t really know why (there’s more coverage in books like Team Topologies, Accelerate, and Project to Product now) but it seemed like a more appropriate model: organising teams around streams of work, continual iteration and improvement, autonomy and focusing on outcomes and goals. A lot of that was because the business at the time was borderline profitable: we were confident we could improve how we worked, and good results would hopefully follow but weren’t sure of precisely what we should improve. We’d focus on good release discipline, measurement, and experimentation to figure it out as we went.

Fortunately, that was really successful for us and we’ve sought to operate our platform teams in similar ways: focus on broader outcomes, continual iteration and improvement, and a strong amount of accountability. It has generally helped align platform teams into the needs of the broader organisation but it’s always a little back and forth: what opportunities do the platform teams identify against stated needs of the stream-aligned teams.

Some platform teams have been easier to set in motion because their service boundaries are more obvious at the outset. Other teams have had to go through a few iterations of building APIs and tools, working with other teams and talking to people to understand where responsibilities lie. I think it has also placed a huge responsibility on whoever takes on the Product Manager role for the team: engaging with the broader organisation and teams to synthesise the problem, helping the team focus on where the value is, continually adapting, and holding everything together is a really difficult task. I think it’s really helped to have people that can hold that vision, communicate well, and bring people together when times get tough or progress feels harder than it should be.

I think it’s also been hard for platform teams to always see the successes that they’re responsible for. A small improvement to the reliability of a data flow for marketing data, for example, may end up making a profit improvement of a few percentage points. It’s often visible to the marketers, but less connected to the work the platform team does. Finding ways to recognise and celebrate the value the platform teams drive has also been difficult.

Figure 3: After a boundary discovery period, the Affiliate Marketing aligned to the platform topology, providing their services for stream-aligned teams to consume without hand-offs.

How did the platform team measure their own success and effectiveness over time? 

I’ve found OKRs to be simple and helpful so we used that as a model to help set and communicate goals. We’d set them around the number of teams we’d want to adopt the platform, the number of applications using the platform autoscaling service, the proportion of applications switched to the platform dynamic credentials service, and so on. Some of those we’d track over longer periods of time, and others were helpful to guide progress for a quarter and then we’d drop them in favour of something else.

We never mandated the use of the platform, so setting key results for the number of onboarded teams forced us to focus on solving problems that would drive adoption. We also look for natural measures of progress: the proportion of traffic served by the platform, and the proportion of revenue served through platform services are both good examples of that. 

At one point the infrastructure team wanted to work with a product team that had a stronger expectation of performance: both in the scale of traffic they served and their server response times. That product also had a substantial revenue line so it was really important to get it right. We needed to both understand what our target was, and then demonstrate we could meet it. We introduced Service Level Objectives (SLOs) as a practice and that really helped to communicate objectives clearly between the platform and product teams and focus the platform team on the handful of things that needed improvement. 

Figure 4: Example SLO dashboard provided by the Cloud Infrastructure team as a service. The platform team exposed their own SLOs to demonstrate the reliability of the platform.

SLOs have been a really critical practice to introduce and something we’ve seen broader adoption across other parts of the organisation, driving improvement and balancing fast flow with reliability.

It was harder to objectively measure the experience of people using the platform, so we’d instead just spend some time working with other teams to see how they used it, and how they might improve their workflow to make the most of it. We also would periodically consider the kinds of conversations we were having on Slack and around the office to appreciate progress.

How does the organization of engineering teams at Uswitch map to the concepts and patterns in the Team Topologies book?

Most engineers work within stream-aligned teams as defined in the book. Our stream-aligned teams are relatively small, long-lived, and work to outcomes. Objectives and goals are balanced between the expectations of the organisation and the ideas that teams generate themselves. They generally work on consumer-facing products although we do have a handful of internal products as well, and also products we provide to partners we work with.

Since I joined in 2010 we’ve leaned on engineering principles to guide the way we organise teams: loosely-coupled and highly cohesive, for example. The Team Topologies book is great for tying a lot of those intuitions together and most importantly giving it some language.

I think that the autonomy we sought became conflated with self-sufficiency: teams felt that if they were to be truly responsible for their success they had to own everything. That was a belief that took a while to overcome as we wanted to drive improvements through the creation of some platform teams to reduce cognitive load. The patterns in the book really help to rationalise the drivers and trade-offs in making those decisions.

Figure 5: Team Topologies at Uswitch in 2020, with multiple platforms and SRE as an enabling team. Together, they help reduce the cognitive load on stream-aligned teams.

More recently the patterns in the book have been helpful for considering how we want to handle other teams that are neither stream-aligned nor platform teams. For example, we’ve spun off a small Site Reliability Engineering (SRE) team that helps create a bridge between the Infrastructure platform team and stream-aligned teams (see Figure 2 above). Our SREs work directly with stream-aligned teams to help them automate and build tools to solve problems of the stream-aligned teams. Some of that automation gets pulled into the platform. They balance between collaboration and facilitation interaction modes.


Follow Paul Ingles:

Follow Uswitch / RVU: