PodDisruptionBudget — A Key for Zero Downtime

Raju Dawadi
3 min readApr 29, 2020

--

In finance, things go good when budget is planned well. Even in extreme scenario or disaster, one can sustain if there is a plan. Just like that.

In Kubernetes world, the budget is for pods. We cannot predict everything to be good all the time. Changes happen that might for pod or node itself, both update and upgrade or even disaster. Here, planning means we don’t let everything to go down but set a scenario where on one way neither our service burn out nor we allocate extra resources left unused.

Coming to the point. Let’s consider a scenario, we need to upgrade version of node or update the spec often. Cluster downscaling is also a normal condition. In these cases, the pods running on the to-be-deleted nodes needs to be drained. I have three nodes:

$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready <none> 3d2h v1.15.10-eks-bac369
node2 Ready <none> 3d2h v1.15.10-eks-bac369
node3 Ready <none> 5d20h v1.15.10-eks-bac369

And many pods are running in these nodes:

kubectl get po -o wide

We need to remove node1 from the pool which we cannot do it by detaching instantly as that will lead to termination of all the pods running in there which can get services down.

First step before detaching node is to make the node unscheduled.

$ kubectl cordon node1
node/node1 cordoned
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready,SchedulingDisabled <none> 3d2h v1.15.10-eks-bac369
node2 Ready <none> 3d2h v1.15.10-eks-bac369
node3 Ready <none> 5d20h v1.15.10-eks-bac369

Now, if I run new pods, none of them will be scheduled on node1 but the pods prior to that are running there as it is. We need a way to drain them.

$ kubectl drain node1 --ignore-daemonsets

If you have pods with local data, additional argument --delete-local-data is required. The drain command first cordon the nodes by itself if not run earlier.

If you quickly check the pods with kubectl get pods , it will terminate all the running pods instantly which were scheduled on node1 . This could lead a downtime! If you are running few number of pods and all of them are scheduled on same node, it will take some time for the pods to be scheduled on other node.

In real scenario, its not possible to do this for each node but that has to be done for lots of nodes by passing label. This impacts the application performance if not down because we loose big number of running containers in our Kubernetes cluster.

To prevent this type of cases, we set a budget for pods called as PodDisruptionBudget(PDB).

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: myapp-pdb
spec:
minAvailable: 1
selector:
matchLabels:
app: myapp

PDB configures the number of concurrent disruptions that application pod experiences when node is to be managed. Deployment, ReplicationController, ReplicaSet, StatefulSet can be bind by PodDisruptionBudget with .spec.selector label selector. The budget is specified by using either of two values:

minAvailable: This is the minimum number of pods that should be running for the label. For example, if we have 20 pods running and minAvailable is set to 10. If the node is to be drained for some reason or pods are to evicted, only 10 will start terminating and will gradually drain rest. But at least 10 of the pods will be ready state so that the application can serve request. The number should be decided based on the traffic or workload the pods should handle.

maxUnavailable: The number of pods that could terminate in case node has to be drained.

In both cases, we can specify both absolute number as well as percentage. Like, if we have 20 pods running and maxUnavailable is set to 50%, then 10 pods can be unavailable.

For the PodDisruptionBudget to work, there must be at least 2 pods running for a label selector otherwise, the node cannot be drained gracefully and it will be evicted forcefully when grace time ends.

The disruption budget can be checked with

kubectl get poddisruptionbudgetsNAME      MIN AVAILABLE MAX UNAVAILABLE   ALLOWED DISRUPTIONS AGE
myapp-pdb 1 N/A 0 7s

This allows for higher availability while permitting the cluster administrator to manage the clusters nodes.

Say Hi to me on Twitter and Linkedin where I keep on sharing interesting updates.

--

--