A Pod Restarts. So, What’s Going on?

7 min readMay 1, 2020

In Kubernetes world, pods are considered to be relatively ephemeral (rather than durable) entities. Means, we cannot expect a pod to be a long running resource. There are various reasons for termination, restart, re-initialization of pods when any change is introduced and the changes can come from multiple dimensions.

A software system can only be perfectly stable if it exists in a vacuum. If we stop changing the codebase, we stop introducing bugs. If the underlying hardware or libraries never change, neither of these components will introduce bugs. If we freeze the current user base, we’ll never have to scale the system.

Ref: https://landing.google.com/sre/sre-book/chapters/simplicity/

A pod can have one or multiple containers one being application container and other could be init container which terminates after it does specific task or application container is ready to do its job, sidecar container which lies attached with main application container.

Let’s dig first how we can see the pods and how we can see restarts & health of pod. How can we know how many containers are there in pod? Simply describing the pod will give details: kubectl describe pod [pod-name] .

Also, a detailed view of pods running in a cluster in particular namespace can be seen with with kubectl get pods:

In above scenario of monitoring namespace, we can see the first two and fourth pod has READY value of 2/2 and rest are 1/1. This means two out of two containers are healthy and ready to serve in first case. And for rest, there are pods with single container and they are healthy too.

The 4th column shows the count of restart. The fifth pods has RESTARTS value of 2 means the pod was restarted twice in last 6 days and 13 hours since its creation. Rest of the pods have not been restarted. What not to be confused is, the restart doesn’t means re-creation of pod. Restart and re-creation or re-initialization are different things. We will also discuss about this below.

Coming back to point of why a pod restarts. I am combining the cases of re-initialization of pods also in the points. The difference is restart keeps the pod name same if used deployment but re-initialize creates a new pod with new name on its suffix values:

1. New deployment

When a new version of container is to be deployed, it re-initialize the pod.

$ kubectl create deploy nginx --image nginx:1.17.0-alpine -n devops
deployment.apps/nginx created$ kubectl get po -n devops
NAME                     READY   STATUS    RESTARTS   AGE
nginx-5759f56c55-cjv57   1/1     Running   0          7s

Now, I need to upgrade the nginx version to 1.18.0-alpine.

While the new version of pod is being deployed(which took around 10s), it had STATUS of ContainerCreating and after its ready, the old pod got killed.

2. Change in environment variable pod

We can define environment variable for single or multiple pods. Listing the defined environment vars:

$ kubectl set env pods --all --list -n devops
# Pod nginx-5777594854-8pnxg, container nginx

We add new variable and check the pods:

$ kubectl set env deployment/nginx DATE=$(date '+%d/%m/%Y-%H:%M:%S') -n devops
deployment.extensions/nginx env updated$ kubectl get po -n devops
NAME                     READY   STATUS              RESTARTS   AGE
nginx-7849b54d8d-tzwjx   1/1     Running             0          14s
nginx-85c988d647-4s7sr   0/1     ContainerCreating   0          2s

Yes, it re-initiated a new pod and after its ready, the older one gets terminated. When there are many pods running, the get gradually updated but not at once when any environment variable is added or updated.

3. HealthCheck failure

There are three probes for health check of a pod: liveness, readiness and startup probes.

Readiness probe is for indication that the container is ready to serve traffic. Means, a load balancer will not send traffic to container unless its Readiness probe succeeds.

Liveness probe recovers a pod when there is any deadlock and stuck being useless. But this probe is mainly responsible for restarting a container. This is where we focus while debugging a restarting container. A simple definition:

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness
    image: k8s.gcr.io/liveness
    args:
    - /server
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
        httpHeaders:
        - name: Custom-Header
          value: Awesome
      initialDelaySeconds: 3
      periodSeconds: 3

Here, the liveness check starts after delay of 3 seconds and tries /healthz path with httpGet requests on port 8080. If the check fails, the container is killed by kubelet and keeps on restarting unless the probe succeeds.

The continuous restart of pod changes the STATUS to CrashLoopBackOff.

Let’s do a test. I changed the initialDelaySeconds: 1 and periodSeconds: 1 and applied the manifest. Here is the result:

$ kubectl get po -n devops
NAME                     READY   STATUS    RESTARTS   AGE
liveness-http            1/1     Running   1          21s$ kubectl get po -n devops
NAME                     READY   STATUS    RESTARTS   AGE
liveness-http            1/1     Running   2          32s$ kubectl get po -n devops
NAME                     READY   STATUS    RESTARTS   AGE
liveness-http            1/1     Running   4          92s$ kubectl get po -n devops 
NAME                     READY   STATUS             RESTARTS   AGE
liveness-http            0/1     CrashLoopBackOff   5          2m27s

The restart count gradually increased and resulted on CrashLoopBackOff ultimately. But how can we debug this? Describe the pod and see the events the end:

kubectl describe pod liveness-http
.....
Events:
  Type     Reason     Age                    From                                               Message
  ----     ------     ----                   ----                                               -------
  Normal   Scheduled  4m30s                  default-scheduler                                  Successfully assigned devops/liveness-http to ip-10-0-1-117.us-west-2.compute.internal
  Normal   Pulled     4m3s (x3 over 4m29s)   kubelet, ip-10-0-1-117.us-west-2.compute.internal  Successfully pulled image "k8s.gcr.io/liveness"
  Normal   Created    4m3s (x3 over 4m29s)   kubelet, ip-10-0-1-117.us-west-2.compute.internal  Created container liveness
  Normal   Started    4m3s (x3 over 4m29s)   kubelet, ip-10-0-1-117.us-west-2.compute.internal  Started container liveness
  Normal   Pulling    3m50s (x4 over 4m30s)  kubelet, ip-10-0-1-117.us-west-2.compute.internal  Pulling image "k8s.gcr.io/liveness"
  Warning  Unhealthy  3m50s (x9 over 4m18s)  kubelet, ip-10-0-1-117.us-west-2.compute.internal  Liveness probe failed: HTTP probe failed with statuscode: 500
  Normal   Killing    3m50s (x3 over 4m16s)  kubelet, ip-10-0-1-117.us-west-2.compute.internal  Container liveness failed liveness probe, will be restarted

It clearly shows the Liveness probe failed with httpStatus code of 500 which resulted in multiple restart.

For debugging, we can increase the Liveness check initialization time or remove the check for some time and see what the problem is by going through the pod logs.

4. Draining Node

In the course of maintenance of nodes which could be for updating spec, upgrading version, fixing problems, the pods scheduled on the node(s) have to be drained means the pods needs to be initialized on healthy node.

$ kubectl drain node1
node/node1 cordoned
evicting pod "liveness-http"
pod/liveness-http evicted
node/node1 evicted$ kubectl get no
NAME        STATUS                  ROLES    AGE     VERSION
node1    Ready,SchedulingDisabled   <none>   28m     v1.14.8

When we drain node1 , it evicted the liveness-http pod. If the compute resource is available on other nodes, then it will be scheduled there. Otherwise, it will remain in pending state. If any loadbalancer was sending traffic, that would return error as none of the pods with the label is in healthy state. This lead downtime! One way for minimizing the downtime is PodDisruptionBudget. I have written a long post on implementing the budget on following blog:

PodDisruptionBudget — A Key for Zero Downtime

In finance, things go good when budget is planned well. Even in extreme scenario or disaster, one can sustain if there…

medium.com

The draining of node could be both graceful and forceful. If we are using spot instance in AWS or Preemptible instance in GCP for saving cost, it gives few minutes of termination notice followed by cordoning the node(making it unschedulable for pods).

In the short duration, the pods scheduled on the to-be-deleted node has to be re-scheduled. There is helm chart for Spot Termination Notice Handler which schedules the pods but when there is single pod running for a label and there are many pods running in node, it might also lead a downtime for short period.

5. OOM(Out of Memory) Kill

This is one of the common reason of restarting container which happens the resource usage is not configured or application itself behaves unpredictable.

If we have allocated 600Mi of memory for a container and it tries to allocate more than this limit, the pod will be killed with OOM. The requests value on the other hand is the pre-allocation for the container.

spec:
  containers:
  - name: app
    image: nginx
    resources:
      limits:
        memory: "600Mi"
      requests:
        memory: "100Mi"

To get idea of the behavior of container in terms of memory /cpu usage/limit, this solely depends on the application type, load its handling, heap memory it uses etc. After observations on the fluctuation by load testing and performance analysis, the limits & requests has to be set.

6. High Node Pressure

Resource sharing is both challenge and feature of in any distributed system, Kubernetes of course. Based on the pressure on the compute node, pods could be rescheduled to nodes with low pressure. kubelet uses CFS(Completely Fair Scheduler) quota to enforce pod CPU limits. When any node runs many CPU-bound pods, the workload can move to different CPU cores depending on whether the pod is throttled and which CPU cores are available at scheduling time.

Even if the node memory is full when all containers are under limits, it could trigger OOM resulting in rescheduling. If the node disk is full and have to free space, pods in there might be evicted.

Have you faced any reason why a Kubernetes pod gets terminated or restarts? Please share on comment.

Say Hi to me on Twitter and Linkedin where I keep on sharing interesting updates.