rktnetes brings rkt container engine to Kubernetes

Monday, July 11, 2016

rktnetes brings rkt container engine to Kubernetes

Editor’s note: this post is part of a series of in-depth articles on what’s new in Kubernetes 1.3

As part of Kubernetes 1.3, we’re happy to report that our work to bring interchangeable container engines to Kubernetes is bearing early fruit. What we affectionately call “rktnetes” is included in the version 1.3 Kubernetes release, and is ready for development use. rktnetes integrates support for CoreOS rkt into Kubernetes as the container runtime on cluster nodes, and is now part of the mainline Kubernetes source code. Today it’s easier than ever for developers and ops pros with container portability in mind to try out running Kubernetes with a different container engine.

“We find CoreOS’s rkt a compelling container engine in Kubernetes because of how rkt is composed with the underlying systemd,” said Mark Petrovic, senior MTS and architect at Xoom, a PayPal service. “The rkt runtime assumes only the responsibility it needs to, then delegates to other system services where appropriate. This separation of concerns is important to us.”

What’s rktnetes?

rktnetes is the nickname given to the code that enables Kubernetes nodes to execute application containers with the rkt container engine, rather than with Docker. This change adds new abilities to Kubernetes, for instance running containers under flexible levels of isolation. rkt explores an alternative approach to container runtime architecture, aimed to reflect the Unix philosophy of cleanly separated, modular tools. Work done to support rktnetes also opens up future possibilities for Kubernetes, such as multiple container image format support, and the integration of other container runtimes tailored for specific use cases or platforms.

Why does Kubernetes need rktnetes?

rktnetes is about more than just rkt. It’s also about refining and exercising Kubernetes interfaces, and paving the way for other modular runtimes in the future. While the Docker container engine is well known, and is currently the default Kubernetes container runtime, a number of benefits derive from pluggable container environments. Some clusters may call for very specific container engine implementations, for example, and ensuring the Kubernetes design is flexible enough to support alternate runtimes, starting with rkt, helps keep the interfaces between components clean and simple.

Separation of concerns: Decomposing the monolithic container daemon

The current container runtime used by Kubernetes imposes a number of design decisions. Experimenting with other container execution architectures is worthwhile in such a rapidly evolving space. Today, when Kubernetes sends a request to a node to start running a pod, it communicates through the kubelet on each node with the default container runtime’s central daemon, responsible for managing all of the node’s containers.

rkt does not implement a monolithic container management daemon. (It is worth noting that the default container runtime is in the midst of refactoring its original monolithic architecture.) The rkt design has from day one tried to apply the principle of modularity to the fullest, including reusing well-tested system components, rather than reimplementing them.

The task of building container images is abstracted away from the container runtime core in rkt, and implemented by an independent utility. The same approach is taken to ongoing container lifecycle management. A single binary, rkt, configures the environment and prepares container images for execution, then sets the container application and its isolation environment running. At this point, the rkt program has done its “one job”, and the container isolator takes over.

The API for querying container engine and pod state, used by Kubernetes to track cluster work on each node, is implemented in a separate service, isolating coordination and orchestration features from the core container runtime. While the API service does not fully implement all the API features of the current default container engine, it already helps isolate containers from failures and upgrades in the core runtime, and provides the read-only parts of the expected API for querying container metadata.

Modular container isolation levels

With rkt managing container execution, Kubernetes can take advantage of the CoreOS container engine’s modular stage1 isolation mechanism. The typical container runs under rkt in a software-isolated environment constructed from Linux kernel namespaces, cgroups, and other facilities. Containers isolated in this common way nevertheless share a single kernel with all the other containers on a system, making for lightweight isolation of running apps.

However, rkt features pluggable isolation environments, referred to as stage1s, to change how containers are executed and isolated. For example, the rkt fly stage1 runs containers in the host namespaces (PID, mount, network, etc), granting containers greater power on the host system. Fly is used for containerizing lower-level system and network software, like the kubelet itself. At the other end of the isolation spectrum, the KVM stage1 runs standard app containers as individual virtual machines, each above its own Linux kernel, managed by the KVM hypervisor. This isolation level can be useful for high security and multi-tenant cluster workloads.

Currently, rktnetes can use the KVM stage1 to execute all containers on a node with VM isolation by setting the kubelet’s –rkt-stage1-image option. Experimental work exists to choose the stage1 isolation regime on a per-pod basis with a Kubernetes annotation declaring the pod’s appropriate stage1. KVM containers and standard Linux containers can be mixed together in the same cluster.

How rkt works with Kubernetes

Kubernetes today talks to the default container engine over an API provided by the Docker daemon. rktnetes communicates with rkt a little bit differently. First, there is a distinction between how Kubernetes changes the state of a node’s containers – how it starts and stops pods, or reschedules them for failover or scaling – and how the orchestrator queries pod metadata for regular, read-only bookkeeping. Two different facilities implement these two different cases.

Managing microservice lifecycles

The kubelet on each cluster node communicates with rkt to prepare containers and their environments into pods, and with systemd, the linux service management framework, to invoke and manage the pod processes. Pods are then managed as systemd services, and the kubelet sends systemd commands over dbus to manipulate them. Lifecycle management, such as restarting failed pods and killing completed processes, is handled by systemd, at the kubelet’s behest.

The API service for reading pod data

A discrete rkt api-service implements the pod introspection mechanisms expected by Kubernetes. While each node’s kubelet uses systemd to start, stop, and restart pods as services, it contacts the API service to read container runtime metadata. This includes basic orchestration information such as the number of pods running on the node, the names and networks of those pods, and the details of pod configuration, resource limits and storage volumes (think of the information shown by the kubectl describe subcommand).

Pod logs, having been written to journal files, are made available for kubectl logs and other forensic subcommands by the API service as well, which reads from log files to provide pod log data to the kubelet for answering control plane requests.

This dual interface to the container environment is an area of very active development, and plans are for the API service to expand to provide methods for the pod manipulation commands. The underlying mechanism will continue to keep separation of concerns in mind, but will hide more of this from the kubelet. The methods the kubelet uses to control the rktnetes container engine will grow less different from the default container runtime interface over time.

Try rktnetes

So what can you do with rktnetes today? Currently, rktnetes passes all of the applicable Kubernetes “end-to-end” (aka “e2e”) tests, provides standard metrics to cAdvisor, manages networks using CNI, handles per-container/pod logs, and automatically garbage collects old containers and images. Kubernetes running on rkt already provides more than the basics of a modular, flexible container runtime for Kubernetes clusters, and it is already a functional part of our development environment at CoreOS.

Developers and early adopters can follow the known issues in the rktnetes notes to get an idea of the wrinkles and bumps test-drivers can expect to encounter. This list groups the high-level pieces required to bring rktnetes to feature parity with the existing container runtime and API. We hope you’ll try out rktnetes in your Kubernetes clusters, too.

Use rkt with Kubernetes Today

The introductory guide Running Kubernetes on rkt walks through the steps to spin up a rktnetes cluster, from kubelet –container-runtime=rkt to networking and starting pods. This intro also sketches the configuration you’ll need to start a cluster on GCE with the Kubernetes kube-up.sh script.

Recent work aims to make rktnetes cluster creation much easier, too. While not yet merged, an in-progress pull request creates a single rktnetes configuration toggle to select rkt as the container engine when deploying a Kubernetes cluster with the coreos-kubernetes configuration tools. You can also check out the rktnetes workshop project, which launches a single-node rktnetes cluster on just about any developer workstation with one vagrant up command.

We’re excited to see the experiments the wider Kubernetes and CoreOS communities devise to put rktnetes to the test, and we welcome your input – and pull requests!

–Yifan Gu and Josh Wood, rktnetes Team, CoreOS. Twitter @CoreOSLinux.

Raw Block Volume support to Beta Mar 7
Automate Operations on your Cluster with OperatorHub.io Feb 28
Building a Kubernetes Edge (Ingress) Control Plane for Envoy v2 Feb 12
Runc and CVE-2019-5736 Feb 11
Poseidon-Firmament Scheduler – Flow Network Graph Based Scheduler Feb 6
Update on Volume Snapshot Alpha for Kubernetes Jan 17
Container Storage Interface (CSI) for Kubernetes GA Jan 15
APIServer dry-run and kubectl diff Jan 14

Creating a Raspberry Pi cluster running Kubernetes, the installation (Part 2) Dec 22
Managing Kubernetes Pods, Services and Replication Controllers with Puppet Dec 17
How Weave built a multi-deployment solution for Scope using Kubernetes Dec 12
Creating a Raspberry Pi cluster running Kubernetes, the shopping list (Part 1) Nov 25
Monitoring Kubernetes with Sysdig Nov 19
One million requests per second: Dependable and dynamic distributed systems at scale Nov 11
Kubernetes 1.1 Performance upgrades, improved tooling and a growing community Nov 9
Kubernetes as Foundation for Cloud Native PaaS Nov 3
Some things you didn’t know about kubectl Oct 28
Kubernetes Performance Measurements and Roadmap Sep 10
Using Kubernetes Namespaces to Manage Environments Aug 28
Weekly Kubernetes Community Hangout Notes - July 31 2015 Aug 4
The Growing Kubernetes Ecosystem Jul 24
Weekly Kubernetes Community Hangout Notes - July 17 2015 Jul 23
Strong, Simple SSL for Kubernetes Services Jul 14
Weekly Kubernetes Community Hangout Notes - July 10 2015 Jul 13
Announcing the First Kubernetes Enterprise Training Course Jul 8
Kubernetes 1.0 Launch Event at OSCON Jul 2
How did the Quake demo from DockerCon Work? Jul 2
The Distributed System ToolKit: Patterns for Composite Containers Jun 29
Slides: Cluster Management with Kubernetes, talk given at the University of Edinburgh Jun 26
Cluster Level Logging with Kubernetes Jun 11
Weekly Kubernetes Community Hangout Notes - May 22 2015 Jun 2
Kubernetes on OpenStack May 19
Weekly Kubernetes Community Hangout Notes - May 15 2015 May 18
Docker and Kubernetes and AppC May 18
Kubernetes Release: 0.17.0 May 15
Resource Usage Monitoring in Kubernetes May 12
Weekly Kubernetes Community Hangout Notes - May 1 2015 May 11
Kubernetes Release: 0.16.0 May 11
AppC Support for Kubernetes through RKT May 4
Weekly Kubernetes Community Hangout Notes - April 24 2015 Apr 30
Borg: The Predecessor to Kubernetes Apr 23
Kubernetes and the Mesosphere DCOS Apr 22
Weekly Kubernetes Community Hangout Notes - April 17 2015 Apr 17
Kubernetes Release: 0.15.0 Apr 16
Introducing Kubernetes API Version v1beta3 Apr 16
Weekly Kubernetes Community Hangout Notes - April 10 2015 Apr 11
Faster than a speeding Latte Apr 6
Weekly Kubernetes Community Hangout Notes - April 3 2015 Apr 4
Participate in a Kubernetes User Experience Study Mar 31
Weekly Kubernetes Community Hangout Notes - March 27 2015 Mar 28
Kubernetes Gathering Videos Mar 23
Welcome to the Kubernetes Blog! Mar 20

rktnetes brings rkt container engine to Kubernetes

Monday, July 11, 2016