With Gaurav Sood

Microservices, Macroproblems

A single page on Doordash can make upward of 1000 gRPC calls (see the interview). For many engineers, upward of a thousand network calls nicely illustrate the chaos and inefficiency unleashed by microservices. Engineers implicitly diff 1000+ gRPC calls with the orders of magnitude fewer calls made by a system designed by an architect looking at the problem afresh today. A 1000+ gRPC calls also seem like a perfect recipe for blowing up latency. There are more items on the debit column. Microservices can also increase costs of monitoring, debugging, and deployment (and hence cause greater downtime and worse performance).

Suboptimality on these dimensions, however, may be optimal. First, rather than scale the entire app, with Microservices we can scale services, saving resources. Second, more importantly, cheap compute, bandwidth, and caching have made it economically feasible to address latency caused by superfluous network calls. Second, today, the main cost of software development is often developers, and microservices may reduce that cost (we revisit this point later in the essay).

The Appeal of Microservices

When lots of people work together on the ‘same’ thing, they can step on each other’s toes. For instance, if multiple developers are working on the same file (or a feature) at the same time, ‘conflicts’ can ensue; ‘conflict resolution’ is painful.

The natural solutions to the problem are sequencing and compartmentalizing. Sequencing seems like an answer by a smart aleck who notes that we didn’t stipulate that lots of people should work together at the same time. Gotcha!

Compartmentalizing aims higher. Couldn’t specific teams work on specific pieces in a way that they don’t come in the way of each other? Such a complete Lego-like modularization has been a dream of computer scientists. Many systems also seem perfectly modularizable. For instance, say you are running a small publishing platform that sells ads for monetization. You could separate the system that produces ML predictions that drive ad personalization from the system (and codebase) that powers the UI. In fact, you can do a lot more. You can abstract away even the foundational layers of ad prediction. For instance, there could be a team devoted to content understanding and another for context understanding. But push further and the architecture shows strains. Each team now owns a service and is in charge of deploying, scaling, and maintaining it. (Before we move further, let’s take a moment to marvel at the fact that we can even conceive of small teams owning services. Thanks to the ever-greater abstraction in software and access to cloud services, today, many standard deployments are standardized, e.g., lambda, docker, and Kubernetes. We can even talk of full-stack engineers.) But, as soon as you move to managing a complex service, the limitations become clear. The move to microservices leaves many (generally small) teams with little expertise in some of the less standardized tasks and in triaging issues that may span the stack. Naturally, small teams also don’t benefit as much from having their code reviewed by senior engineers in other part of the organization. Multiple teams may also solve the same problem.

Limits of Compartmentalization

At the extremum of microservices, each service is a black box. Services over the Internet are a good metaphor. Outside of a contract (often instantiated just as an agreement over the API, frequency of calls, and some uptime SLA), you don’t need to know more to use an Internet service. The appeal of this system is immediate. It provides a clear delineation of responsibilities. To make separation complete, advocates of this vision also want no common infrastructure, e.g., shared database, across services. This kind of a vision runs into two problems. First, decoupling makes it hard to maintain consistency of data. Second, decoupling is highly inefficient. Most companies benefit from having shared services, including common databases. Add to it that the rise in the use of cloud infrastructure with cloud-native distributed databases like DynamoDB has made some of the demand for this kind of firewalling moot.

Different Ways to Compartmentalize

Microservices are but one way to compartmentalize. They happen to be the preferred way to compartmentalize in a world where we prioritize teams becoming experts in business problems rather than specific aspects of software development. Microservices are, in effect, a product manager’s vision of compartmentalization. However the issues with microservices, primarily the need for each team needing to be expert in all portions of the software stack, provide a segue to discuss other ways to compartmentalize. Computer scientists have traditionally opted to compartmentalize over the technical stack, e.g., classically, backend, and frontend teams. The CS compartmentalization prioritizes technical depth in certain areas. In practice, we often see a hybrid of the two approaches. Many companies have infrastructure and DE teams and “pod-like” product or feature-focused teams. The rationale behind the popularity of the hybrid system is again that software benefits from some computer science expertise and from feature specific problem understanding. The exact balance will be specific to the company (and where it is in its journey) but rarely is it at the one end or the other.

Monorepo

The extremum of compartmentalization means not just no shared hardware but also no shared code. But once again rarely is that an optimal arrangement. Having a single repository makes it easier to enforce standards, build a dependency graph (which can help with triaging issues around backward compatibility), and reuse code. It is useful to issue some clarifications: 1. Microservices can (and often do) exist in monorepos, 2. A monorepo doesn’t imply a single language; standards can be set at the repository and language levels, and 3. Monorepo doesn’t constrain deployment options; we can deploy services in a modular fashion or as part of some common release. By the same token, having multiple repositories is no bar to enforcing standards (common CI/CD tests) (in fact, Amazon has this model – independent repos for “packages”, and enough tooling to allow code reusability and backwards compatibility checks), building a dependency graph (achievable with a little organization), and reusing code (which can be shipped as libraries).

Function Calling Vs. Network Calling

Function calling is better than network calling in three ways. First, function calling avoids latency from network requests. Second, network calling adds network errors to the potential set of errors and hence makes root-causing harder. The third advantage lies in the relative ease of building a dependency graph which enables checking for backward compatibility. However, the theory and practice of deprecating APIs is also well-established. Building a dependency graph from network calling is workable. For one, we generally write client libraries for the APIs that wrap the network calls in functions. For two, we can explicitly ask for identification, e.g., a team ID as part of the network call. Third, and more commonly, API deprecation strategies are well developed, including adding deprecation statuses as part of the return object, which along with publishing API specs, then makes it the responsibility of the downstream customer to make the changes in response to any breaking changes.

Lastly, some people point to another disadvantage of network calling. It is conventional today for everybody in the gRPC call graph to get a ping when a service goes down. However, this ought to be addressable by building logging that traces the issue to a particular service.

Release Cadence

The smaller the release, the easier it is to triage what went wrong. When you make bulk changes, it is possible for errors to go unnoticed. To give an example from ML, you could easily do two correct feature changes and three bad ones and still have test performance tick in the right direction. And while it may be optimal to release, the counterfactual is that with five correct features (which we would get to if we identified the issues with the three), the performance would have been even better.

Releasing frequently, however, is not always an option. Release cadence is most strongly affected by how the software is distributed and how many other services depend on the software. For cloud-native software like Google Docs, the releases can be faster. For mobile applications, you cannot release too frequently as updates are disruptive for the user. Even frequent updates to Chrome feel exhausting. Developers of widely used OS also have to be cognizant of developers on their platforms. They need to provide enough headspace for developers of important applications that run on the platform to adequately test and amend their software. Small changes seem good from the perspective of being able to detect errors. But small releases mean frequent releases. And releasing too frequently can hinder the ability to detect errors. If you release too frequently, it is not easy to figure out which version you should roll back to, as the problems don’t always take seconds to surface. As a result, often, organizations snap to some kind of a cadence that is a compromise between velocity and the width of the window needed to reliably surface problems from deployments.

Acknowledgment: The essay benefitted from comments from Naresh Bhatti, Amit Bhosle, Khurram Nasser, and Amit Sharma.

References

  1. Celozi, Cezarre. 2020. Future-proofing: How DoorDash Transitioned from a Code Monolith to a Microservice Architecture. https://careersatdoordash.com/blog/how-doordash-transitioned-from-a-monolith-to-microservice
  2. NeetCodeIO. 2024. Microservices are Technical Debt. https://www.youtube.com/watch?v=LcJKxPXYudE
  3. Kolny, Marcin. 2023. Scaling up the Prime Video audio/video monitoring service and reducing costs by 90%. https://web.archive.org/web/20240415193548/https://www.primevideotech.com/video-streaming/scaling-up-the-prime-video-audio-video-monitoring-service-and-reducing-costs-by-90
  4. Potvin, Rachel, and Josh Levenberg. 2016. Why Google stores billions of lines of code in a single repository. Communications of the ACM 59.7: 78-87. https://research.google/pubs/why-google-stores-billions-of-lines-of-code-in-a-single-repository/