Wednesday, November 07, 2018

gRPC Load Balancing on Kubernetes without Tears

Author: William Morgan (Buoyant)

Many new gRPC users are surprised to find that Kubernetes’s default load balancing often doesn’t work out of the box with gRPC. For example, here’s what happens when you take a simple gRPC Node.js microservices app and deploy it on Kubernetes:

While the voting service displayed here has several pods, it’s clear from Kubernetes’s CPU graphs that only one of the pods is actually doing any work—because only one of the pods is receiving any traffic. Why?

In this blog post, we describe why this happens, and how you can easily fix it by adding gRPC load balancing to any Kubernetes app with Linkerd, a CNCF service mesh and service sidecar.

Why does gRPC need special load balancing?

First, let’s understand why we need to do something special for gRPC.

gRPC is an increasingly common choice for application developers. Compared to alternative protocols such as JSON-over-HTTP, gRPC can provide some significant benefits, including dramatically lower (de)serialization costs, automatic type checking, formalized APIs, and less TCP management overhead.

However, gRPC also breaks the standard connection-level load balancing, including what’s provided by Kubernetes. This is because gRPC is built on HTTP/2, and HTTP/2 is designed to have a single long-lived TCP connection, across which all requests are multiplexed—meaning multiple requests can be active on the same connection at any point in time. Normally, this is great, as it reduces the overhead of connection management. However, it also means that (as you might imagine) connection-level balancing isn’t very useful. Once the connection is established, there’s no more balancing to be done. All requests will get pinned to a single destination pod, as shown below:

Why doesn’t this affect HTTP/1.1?

The reason why this problem doesn’t occur in HTTP/1.1, which also has the concept of long-lived connections, is because HTTP/1.1 has several features that naturally result in cycling of TCP connections. Because of this, connection-level balancing is “good enough”, and for most HTTP/1.1 apps we don’t need to do anything more.

To understand why, let’s take a deeper look at HTTP/1.1. In contrast to HTTP/2, HTTP/1.1 cannot multiplex requests. Only one HTTP request can be active at a time per TCP connection. The client makes a request, e.g. GET /foo, and then waits until the server responds. While that request-response cycle is happening, no other requests can be issued on that connection.

Usually, we want lots of requests happening in parallel. Therefore, to have concurrent HTTP/1.1 requests, we need to make multiple HTTP/1.1 connections, and issue our requests across all of them. Additionally, long-lived HTTP/1.1 connections typically expire after some time, and are torn down by the client (or server). These two factors combined mean that HTTP/1.1 requests typically cycle across multiple TCP connections, and so connection-level balancing works.

So how do we load balance gRPC?

Now back to gRPC. Since we can’t balance at the connection level, in order to do gRPC load balancing, we need to shift from connection balancing to request balancing. In other words, we need to open an HTTP/2 connection to each destination, and balance requests across these connections, as shown below:

In network terms, this means we need to make decisions at L5/L7 rather than L3/L4, i.e. we need to understand the protocol sent over the TCP connections.

How do we accomplish this? There are a couple options. First, our application code could manually maintain its own load balancing pool of destinations, and we could configure our gRPC client to use this load balancing pool. This approach gives us the most control, but it can be very complex in environments like Kubernetes where the pool changes over time as Kubernetes reschedules pods. Our application would have to watch the Kubernetes API and keep itself up to date with the pods.

Alternatively, in Kubernetes, we could deploy our app as headless services. In this case, Kubernetes will create multiple A records in the DNS entry for the service. If our gRPC client is sufficiently advanced, it can automatically maintain the load balancing pool from those DNS entries. But this approach restricts us to certain gRPC clients, and it’s rarely possible to only use headless services.

Finally, we can take a third approach: use a lightweight proxy.

gRPC load balancing on Kubernetes with Linkerd

Linkerd is a CNCF-hosted service mesh for Kubernetes. Most relevant to our purposes, Linkerd also functions as a service sidecar, where it can be applied to a single service—even without cluster-wide permissions. What this means is that when we add Linkerd to our service, it adds a tiny, ultra-fast proxy to each pod, and these proxies watch the Kubernetes API and do gRPC load balancing automatically. Our deployment then looks like this:

Using Linkerd has a couple advantages. First, it works with services written in any language, with any gRPC client, and any deployment model (headless or not). Because Linkerd’s proxies are completely transparent, they auto-detect HTTP/2 and HTTP/1.x and do L7 load balancing, and they pass through all other traffic as pure TCP. This means that everything will just work.

Second, Linkerd’s load balancing is very sophisticated. Not only does Linkerd maintain a watch on the Kubernetes API and automatically update the load balancing pool as pods get rescheduled, Linkerd uses an exponentially-weighted moving average of response latencies to automatically send requests to the fastest pods. If one pod is slowing down, even momentarily, Linkerd will shift traffic away from it. This can reduce end-to-end tail latencies.

Finally, Linkerd’s Rust-based proxies are incredibly fast and small. They introduce <1ms of p99 latency and require <10mb of RSS per pod, meaning that the impact on system performance will be negligible.

gRPC Load Balancing in 60 seconds

Linkerd is very easy to try. Just follow the steps in the Linkerd Getting Started Instructions—install the CLI on your laptop, install the control plane on your cluster, and “mesh” your service (inject the proxies into each pod). You’ll have Linkerd running on your service in no time, and should see proper gRPC balancing immediately.

Let’s take a look at our sample voting service again, this time after installing Linkerd:

As we can see, the CPU graphs for all pods are active, indicating that all pods are now taking traffic—without having to change a line of code. Voila, gRPC load balancing as if by magic!

Linkerd also gives us built-in traffic-level dashboards, so we don’t even need to guess what’s happening from CPU charts any more. Here’s a Linkerd graph that’s showing the success rate, request volume, and latency percentiles of each pod:

We can see that each pod is getting around 5 RPS. We can also see that, while we’ve solved our load balancing problem, we still have some work to do on our success rate for this service. (The demo app is built with an intentional failure—as an exercise to the reader, see if you can figure it out by using the Linkerd dashboard!)

Wrapping it up

If you’re interested in a dead simple way to add gRPC load balancing to your Kubernetes services, regardless of what language it’s written in, what gRPC client you’re using, or how it’s deployed, you can use Linkerd to add gRPC load balancing in a few commands.

There’s a lot more to Linkerd, including security, reliability, and debugging and diagnostics features, but those are topics for future blog posts.

Want to learn more? We’d love to have you join our rapidly-growing community! Linkerd is a CNCF project, hosted on GitHub, and has a thriving community on Slack, Twitter, and the mailing lists. Come and join the fun!

Raw Block Volume support to Beta Mar 7
Automate Operations on your Cluster with OperatorHub.io Feb 28
Building a Kubernetes Edge (Ingress) Control Plane for Envoy v2 Feb 12
Runc and CVE-2019-5736 Feb 11
Poseidon-Firmament Scheduler – Flow Network Graph Based Scheduler Feb 6
Update on Volume Snapshot Alpha for Kubernetes Jan 17
Container Storage Interface (CSI) for Kubernetes GA Jan 15
APIServer dry-run and kubectl diff Jan 14

Creating a Raspberry Pi cluster running Kubernetes, the installation (Part 2) Dec 22
Managing Kubernetes Pods, Services and Replication Controllers with Puppet Dec 17
How Weave built a multi-deployment solution for Scope using Kubernetes Dec 12
Creating a Raspberry Pi cluster running Kubernetes, the shopping list (Part 1) Nov 25
Monitoring Kubernetes with Sysdig Nov 19
One million requests per second: Dependable and dynamic distributed systems at scale Nov 11
Kubernetes 1.1 Performance upgrades, improved tooling and a growing community Nov 9
Kubernetes as Foundation for Cloud Native PaaS Nov 3
Some things you didn’t know about kubectl Oct 28
Kubernetes Performance Measurements and Roadmap Sep 10
Using Kubernetes Namespaces to Manage Environments Aug 28
Weekly Kubernetes Community Hangout Notes - July 31 2015 Aug 4
The Growing Kubernetes Ecosystem Jul 24
Weekly Kubernetes Community Hangout Notes - July 17 2015 Jul 23
Strong, Simple SSL for Kubernetes Services Jul 14
Weekly Kubernetes Community Hangout Notes - July 10 2015 Jul 13
Announcing the First Kubernetes Enterprise Training Course Jul 8
Kubernetes 1.0 Launch Event at OSCON Jul 2
How did the Quake demo from DockerCon Work? Jul 2
The Distributed System ToolKit: Patterns for Composite Containers Jun 29
Slides: Cluster Management with Kubernetes, talk given at the University of Edinburgh Jun 26
Cluster Level Logging with Kubernetes Jun 11
Weekly Kubernetes Community Hangout Notes - May 22 2015 Jun 2
Kubernetes on OpenStack May 19
Weekly Kubernetes Community Hangout Notes - May 15 2015 May 18
Docker and Kubernetes and AppC May 18
Kubernetes Release: 0.17.0 May 15
Resource Usage Monitoring in Kubernetes May 12
Weekly Kubernetes Community Hangout Notes - May 1 2015 May 11
Kubernetes Release: 0.16.0 May 11
AppC Support for Kubernetes through RKT May 4
Weekly Kubernetes Community Hangout Notes - April 24 2015 Apr 30
Borg: The Predecessor to Kubernetes Apr 23
Kubernetes and the Mesosphere DCOS Apr 22
Weekly Kubernetes Community Hangout Notes - April 17 2015 Apr 17
Kubernetes Release: 0.15.0 Apr 16
Introducing Kubernetes API Version v1beta3 Apr 16
Weekly Kubernetes Community Hangout Notes - April 10 2015 Apr 11
Faster than a speeding Latte Apr 6
Weekly Kubernetes Community Hangout Notes - April 3 2015 Apr 4
Participate in a Kubernetes User Experience Study Mar 31
Weekly Kubernetes Community Hangout Notes - March 27 2015 Mar 28
Kubernetes Gathering Videos Mar 23
Welcome to the Kubernetes Blog! Mar 20

gRPC Load Balancing on Kubernetes without Tears

Wednesday, November 07, 2018

gRPC Load Balancing on Kubernetes without Tears

Why does gRPC need special load balancing?

Why doesn’t this affect HTTP/1.1?

So how do we load balance gRPC?

gRPC load balancing on Kubernetes with Linkerd

gRPC Load Balancing in 60 seconds

Wrapping it up

« Prev

Next >>

2019

2018

2017

2016

2015