Service Mesh Explained

The increased popularity of containerized application deployments means more applications consist of hundreds of containers that can become difficult to manage. As a result, service mesh has become a popular tool for communicating between services.

In this article you’ll learn how service mesh removes operational toil from your microservices, as well as when it’s worth using one—and when it’s not. We’ll also give an overview of the main service mesh implementations available today.

What Is a Service Mesh?

A service mesh is a tool at the infrastructure layer that transparently adds observability, reliability, load balancing, service discovery, authentication, and authorization support to the applications.

The increasing sprawl of microservices makes it challenging to enforce things like service discovery, mutual TLS, circuit breaking, and observability. Service mesh implements all these cross-cutting concerns transparently for applications via a set of network proxies deployed alongside applications known as sidecar.

What Does Service Mesh Solve?

Service mesh helps drive business value by removing operational toil from microservices deployment. A service mesh allows DevOps to implement cross-cutting concerns for all the services employed by an application. These features can be configured from a single and consistent interface within the service mesh called the control plane.

A service mesh offers many features that solve common problems among applications. Here’s a list of some of the most prominent features of service mesh:

Observability: All of the communication between different services running inside a service mesh goes through the mesh infrastructure (sidecars). At the same time, metrics pertaining to requests (both inbound and outbound) are captured and can then be used to optimize the communication patterns.
Mutual TLS: Service mesh encrypts all traffic between services transparently. The sidecar proxies intercept the outbound connection and encrypts it. Similarly, it intercepts the inbound connections and decrypts the data before passing the request on to the application.
Authentication and Authorization: Since the mesh is intercepting all traffic, it also performs authentication and authorization. If the auth fails, the mesh terminates the request.
Automatic Retries: Service mesh also implements automatic retries in case of a failed request. These retries are transparent to the application.
Circuit Breaker: If a service is down/slow, the mesh will break open the circuit so all subsequent requests to that service will fail. This allows for a graceful termination as opposed to degraded user experience.
Load Balancing: Service mesh load balances the incoming traffic to a service and ensures that traffic is evenly distributed among multiple instances of a service.
Traffic Policies: Service mesh allows you to control your traffic for multiple scenarios. It supports rule-based traffic splitting for canary rollouts, A/B testing, and staged rollouts with percentage-based traffic splitting.

Different implementations of service mesh offer different features, but the features discussed above are the most common mesh capabilities.

In absence of a service mesh, each service developer has to implement their feature at the application layer, leading to duplication of effort. That’s why implementing a service mesh translates into cost savings. We’ll look at examples of this in a moment.

What Does Service Mesh Architecture Look Like?

Let’s take a bird’s eye view of service mesh architecture. A service mesh is primarily made of two components: a Control Plane and a Data Plane (see diagram below).

As noted above, a service mesh implements all of its functionality using sidecar proxies. These proxies constitute the service mesh’s Data Plane and are responsible for collecting data for tracing and observability. They are also responsible for intercepting requests for other features, like retries, encryption, circuit breaker, and more.

The Control Plane, on the other hand, is a single and consistent interface used to configure the proxies. You can enable and disable any feature discussed so far using a provided mechanism such as Command Line Interface (CLI), an SDK, or an API.

Having a single place to configure the entire service mesh is a powerful construct that brings down the operational overhead significantly.

Service Mesh Competitors

There are many service mesh products on the market. Below is a short list of some of the most popular products available today.

In the chart below, we’ll compare four of these products and their features to see how they stack up. The four products we’ve chosen—Istio, Linkerd, Consul, and Open Service Mesh—have the most features available and are among the most popular choices in the industry.

All of the meshes noted here provide a comparable feature set with slight differences in where and how they can be deployed. The best choice for your company will depend on the ecosystem you have and the level of support you need. If you have any specific feature need that is exclusive to a select few, that will naturally narrow down the list.

What Is Service Mesh Interface (SMI)?

As we saw in the previous sections, the market is flooded with service mesh options. That’s why the Cloud Native Computing Foundation (CNCF) decided to implement Service Mesh Interface (SMI), a standard specification that covers the most common service mesh capabilities.

The SMI is applicable only to meshes running in Kubernetes and it’s not a comprehensive specification of all the service mesh features. However, it does implement the following capabilities:

Traffic policy: Encrypting the traffic and enforcing service identities across all services
Traffic telemetry: Intercept requests and capture metrics on error and latency
Traffic management: Manage the traffic distribution between different services

The goal of SMI was to build a specification against which application developers can build their applications, without locking into any specific implementation.

When to Use a Service Mesh—and When Not to

So far, we’ve discussed the many benefits of using a service mesh. However, a service mesh does come with downsides and complexities. These are some of the most common issues:

Service meshes are complex to implement and require deep expertise in several areas. For example, if you add Istio on top of an orchestrator like Kubernetes, the operators will have to be experts in both technologies.
Service meshes are slow. Because each request is being intercepted by the mesh, it adds an additional delay to the entire architecture.
Service meshes are highly opinionated in the way they are implemented. This forces the developers and operators to adapt and conform to a service mesh’s rules.

With these limitations in mind, you should wait to implement a service mesh in your architecture until you have a considerably large scale of microservices or have the requirements that can be met by a service mesh.

You can avoid implementing a service mesh when you are just starting out with Kubernetes by implementing some mesh functionalities in your application layer. A few open-source libraries used to implement a subset of features in your Node.js applications include: opossum, for circuit breaker; node-rate-limiter for rate limiting; and Jaeger, for distributed tracing.

Conclusion

In this article, we covered the basics of what a service mesh is, including the features it provides. At a high level, there are two components that comprise a service mesh’s architecture. These include a Data Plane made up of sidecar proxies that are responsible for intercepting requests and providing the mesh features, and a Control Plane that allows the configuration of the sidecars from a central point.

We also looked at how service meshes can help eliminate toil from your microservices by elevating the overall resilience, observability, and security of any architecture. A service mesh can solve a wide array of problems that teams face when they start to scale out their microservices architecture, and they can ease the operational burden from infrastructure teams.

That said, teams that are just starting out with Kubernetes or microservices architecture, or who have only few services, should avoid the complexity of a service mesh. Ultimately, teams should do their due diligence to make sure a service mesh is right for them before investing time and money in implementing one.