Troubleshooting

This document with highlighting how to troubleshoot the deployment of a Kubernetes cluster, it will not cover debugging of workloads inside Kubernetes.

Before you begin
Understanding Cluster Status
SSHing to units
Collecting debug information
Common Problems
Logging and monitoring

Before you begin

This page assumes you have a working Juju deployed cluster.

Understanding Cluster Status

Using juju status can give you some insight as to what’s happening in a cluster:

Model  Controller  Cloud/Region   Version
kubes  work-multi  aws/us-east-2  2.0.2.1

App                Version  Status  Scale  Charm              Store       Rev  OS      Notes
easyrsa            3.0.1    active      1  easyrsa            jujucharms    3  ubuntu  
etcd               2.2.5    active      1  etcd               jujucharms   17  ubuntu  
flannel            0.6.1    active      2  flannel            jujucharms    6  ubuntu  
kubernetes-master  1.4.5    active      1  kubernetes-master  jujucharms    8  ubuntu  exposed
kubernetes-worker  1.4.5    active      1  kubernetes-worker  jujucharms   11  ubuntu  exposed

Unit                  Workload  Agent  Machine  Public address  Ports           Message
easyrsa/0*            active    idle   0/lxd/0  10.0.0.55                       Certificate Authority connected.
etcd/0*               active    idle   0        52.15.47.228    2379/tcp        Healthy with 1 known peers.
kubernetes-master/0*  active    idle   0        52.15.47.228    6443/tcp        Kubernetes master services ready.
  flannel/1           active    idle            52.15.47.228                    Flannel subnet 10.1.75.1/24
kubernetes-worker/0*  active    idle   1        52.15.177.233   80/tcp,443/tcp  Kubernetes worker running.
  flannel/0*          active    idle            52.15.177.233                   Flannel subnet 10.1.63.1/24

Machine  State    DNS            Inst id              Series  AZ
0        started  52.15.47.228   i-0bb211a18be691473  xenial  us-east-2a
0/lxd/0  started  10.0.0.55      juju-153b74-0-lxd-0  xenial  
1        started  52.15.177.233  i-0502d7de733be31bb  xenial  us-east-2b

In this example we can glean some information. The Workload column will show the status of a given service. The Message section will show you the health of a given service in the cluster. During deployment and maintenance these workload statuses will update to reflect what a given node is doing. For example the workload my say maintenance while message will describe this maintenance as Installing docker.

During normal operation the Workload should read active, the Agent column (which reflects what the Juju agent is doing) should read idle, and the messages will either say Ready or another descriptive term. juju status --color will also return all green results when a cluster’s deployment is healthy.

Status can become unwieldy for large clusters, it is then recommended to check status on individual services, for example to check the status on the workers only:

juju status kubernetes-worker

or just on the etcd cluster:

juju status etcd

Errors will have an obvious message, and will return a red result when used with juju status --color. Nodes that come up in this manner should be investigated.

SSHing to units

You can ssh to individual units easily with the following convention, juju ssh <servicename>/<unit#>:

juju ssh kubernetes-worker/3

Will automatically ssh you to the 3rd worker unit.

juju ssh easyrsa/0

This will automatically ssh you to the easyrsa unit.

Collecting debug information

Sometimes it is useful to collect all the information from a cluster to share with a developer to identify problems. This is best accomplished with CDK Field Agent.

Download and execute the collect.py script from CDK Field Agent on a box that has a Juju client configured with the current controller and model pointing at the CDK deployment of interest.

Running the script will generate a tarball of system information and includes basic information such as systemctl status, Juju logs, charm unit data, etc. Additional application-specific information may be included as well.

Common Problems

Load Balancer interfering with Helm

This section assumes you have a working deployment of Kubernetes via Juju using a Load Balancer for the API, and that you are using Helm to deploy charts.

To deploy Helm you will have run:

helm init
$HELM_HOME has been configured at /home/ubuntu/.helm
Tiller (the helm server side component) has been installed into your Kubernetes Cluster.
Happy Helming!

Then when using helm you may see one of the following errors:

Helm doesn’t get the version from the Tiller server

helm version
Client: &version.Version{SemVer:"v2.1.3", GitCommit:"5cbc48fb305ca4bf68c26eb8d2a7eb363227e973", GitTreeState:"clean"}
Error: cannot connect to Tiller

Helm cannot install your chart

helm install <chart> --debug
Error: forwarding ports: error upgrading connection: Upgrade request required

This is caused by the API load balancer not forwarding ports in the context of the helm client-server relationship. To deploy using helm, you will need to follow these steps:

Expose the Kubernetes Master service

   juju expose kubernetes-master

Identify the public IP address of one of your masters

   juju status kubernetes-master
   Model       Controller  Cloud/Region   Version
   production  k8s-admin   aws/us-east-1  2.0.0

   App                Version  Status  Scale  Charm              Store       Rev  OS      Notes
   flannel            0.6.1    active      1  flannel            jujucharms    7  ubuntu
   kubernetes-master  1.5.1    active      1  kubernetes-master  jujucharms   10  ubuntu  exposed

   Unit                  Workload  Agent  Machine  Public address  Ports     Message
   kubernetes-master/0*  active    idle   5        54.210.100.102    6443/tcp  Kubernetes master running.
     flannel/0           active    idle            54.210.100.102              Flannel subnet 10.1.50.1/24

   Machine  State    DNS           Inst id              Series  AZ
   5        started  54.210.100.102  i-002b7150639eb183b  xenial  us-east-1a

   Relation      Provides               Consumes           Type
   certificates  easyrsa                kubernetes-master  regular
   etcd          etcd                   flannel            regular
   etcd          etcd                   kubernetes-master  regular
   cni           flannel                kubernetes-master  regular
   loadbalancer  kubeapi-load-balancer  kubernetes-master  regular
   cni           kubernetes-master      flannel            subordinate
   cluster-dns   kubernetes-master      kubernetes-worker  regular
   cni           kubernetes-worker      flannel            subordinate

In this context the public IP address is 54.210.100.102.

If you want to access this data programmatically you can use the JSON output:

   juju show-status kubernetes-master --format json | jq --raw-output '.applications."kubernetes-master".units | keys[]'
   54.210.100.102

Update the kubeconfig file

Identify the kubeconfig file or section used for this cluster, and edit the server configuration.

By default, it will look like https://54.213.123.123:443. Replace it with the Kubernetes Master endpoint https://54.210.100.102:6443 and save.

Note that the default port used by CDK for the Kubernetes Master API is 6443 while the port exposed by the load balancer is 443.

Start helm again!

   helm install <chart> --debug
   Created tunnel using local port: '36749'
   SERVER: "localhost:36749"
   CHART PATH: /home/ubuntu/.helm/<chart>
   NAME:   <chart>
   ...
   ...

Logging and monitoring

By default there is no log aggregation of the Kubernetes nodes, each node logs locally. Please read over the logging page for more information.

Feedback

Was this page helpful?

Thanks for the feedback. If you have a specific, answerable question about how to use Kubernetes, ask it on Stack Overflow. Open an issue in the GitHub repo if you want to report a problem or suggest an improvement.