Troubleshoot pod CrashLoopBackOff error:: Kubernetes

Troubleshoot pod CrashLoopBackOff error

Hello Friends, Hope you are doing great. Many of you have already come across such a situation where you are creating a pod but it goes to “CrashLoopBackOff” state and you find difficulty in fixing it. Please note that your pod can fail in all kinds of ways. One failure status/state is CrashLoopBackOff. In this post, I will explain how to “Troubleshoot pod CrashLoopBackOff error” in detail so that you can resolve this error whenever you face such issue in future.

Note: I would also suggest you read this thoroughly so that you can understand each and every step clearly.

What is Kuberenetes CrashLoopBackOff Error?

A CrashloopBackOff means that you have a pod which is starting, crashing, and again starting, and then crashing. If you are a beginner on Kubernetes, I would suggest you read the post: Introduction to Kubernetes World

Please note that failed containers are restarted by the Kubelet and is restarted with an exponentially back-off delay (10s, 20, 40s, and so on) capped at 5 minutes and is reset after 10 minutes of successful execution. You can refer the below example for restartPolicy of the pod. Reference: podRestartPolicy

PodSpec has a restartPolicy field. Its values are Always, OnFailure, and Never which applies to all containers in a pod. The default value for restartPolicy is “Always“. Please refer the below config to understand the point that I just said.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: test-kube
  labels:
    name: myPod
    type: proxy
spec:
  containers:
  - name: nginxcontainer
    image: nginx:latest
    resources:
      limits:
        memory: "128Mi"
        cpu: "500m"
    ports:
      - containerPort: 80
  restartPolicy: Always

Why does a CrashLoopBackOff occur?

There are a number of different reasons for CrashLoopBackOff. Here are a few reasons:

  • The application inside the container keeps crashing.
  • Some of the parameters of the pod/container have been configured incorrectly.
  • An error has been made while deploying Kubernetes pod, etc.

How do I see if my pod is having CrashLoopBackOff issue?

You usually see this when you run your standard command kubectl get pods (You need to use namespace when your pod is under separate/dedicated namespace, then the command will be kubectl get pods -n <YourNameSpace> ).

~ $ kubectl get pods -n test-kube
NAME                         READY   STATUS             RESTARTS   AGE
challenge-7b97fd8b7f-cdvh4   0/1     CrashLoopBackOff   2          60s

Note: test-kube is my namespace.

Trobleshooting steps:

Troubleshoot pod CrashLoopBackOff error
  • Step 1: Describe the pod to get more information on this.

Doing kubectl describe pod <podname> (in case of a dedicated namespace, execute: kubectl describe pod <podname> -n <YourNameSpace>) will give you more information on the pod.

~ $ kubectl describe pod challenge-7b97fd8b7f-cdvh4 -n test-kube
Name:         challenge-7b97fd8b7f-cdvh4
Namespace:    test-kube
Priority:     0
Node:         minikube/192.168.99.100
Start Time:   Sun, 28 Jun 2020 20:25:14 +0530
Labels:       os=ubuntu
              pod-template-hash=7b97fd8b7f
Annotations:  <none>
Status:       Running
IP:           172.17.0.4
IPs:
  IP:           172.17.0.4
Controlled By:  ReplicaSet/challenge-7b97fd8b7f
Containers:
  my-name:
    Container ID:   docker://4d397634b294992f80067083933cb37f00da27df3674f4ba383f5d882d9bfc3e
    Image:          ubuntu:latest
    Image ID:       docker-pullable://ubuntu@sha256:747d2dbbaaee995098c9792d99bd333c6783ce56150d1b11e333bbceed5c54d7
    Port:           22/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 28 Jun 2020 20:25:14 +0530
      Finished:     Sun, 28 Jun 2020 20:25:23 +0530
    Ready:          False
    Restart Count:  8
    Limits:
      cpu:     500m
      memory:  500Mi
    Requests:
      cpu:        500m
      memory:     500Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-5sl7g (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-5sl7g:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-5sl7g
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  <unknown>          default-scheduler  Successfully assigned test-kube/challenge-7b97fd8b7f-cdvh4 to minikube
  Normal   Pulled     19m (x4 over 20m)  kubelet, minikube  Successfully pulled image "ubuntu:latest"
  Normal   Created    19m (x4 over 20m)  kubelet, minikube  Created container my-name
  Normal   Started    19m (x4 over 20m)  kubelet, minikube  Started container my-name
  Normal   Pulling    18m (x5 over 20m)  kubelet, minikube  Pulling image "ubuntu:latest"
  Warning  BackOff   3s (x93 over 20m)  kubelet, minikube  Back-off restarting failed container
~ $

We got the description of the pod “challenge-7b97fd8b7f-cdvh4“. The first thing, that we should focus on this output is the “Events“. This will tell you what Kubernetes is doing here. Reading the “Events” section from top to bottom tells us:

  • the pod was assigned to a node (Successfully assigned test-kube/challenge-7b97fd8b7f-cdvh4 to minikube)
  • starts pulling the image (Successfully pulled image "ubuntu:latest")
  • starting the image in the form of creating a container (Created container my-name)
  • starting the container (Started container my-name)
  • it goes to “Back-off” state (Back-off restarting failed container).

Further, you can use the verbose option in describe command to see more details description of the pod and you will get the exact reason or clue why the pod is exiting. Like: kubectl describe pod <podname> -n test-kube -v=9

For more details on this, you can refer the Kubectl output verbosity and debugging.

Bonus Tip: You can also use a magic command kubectl get events -n test-kube to get the events directly for a pod.

~ $ kubectl get events -n test-kube
LAST SEEN   TYPE      REASON    OBJECT                           MESSAGE
9m39s       Normal    Pulling   pod/challenge-7b97fd8b7f-cdvh4   Pulling image "ubuntu:latest"
4m38s       Warning   BackOff   pod/challenge-7b97fd8b7f-cdvh4   Back-off restarting failed container

The message says that the pod is in Back-off restarting failed container. This is most likely means that Kubernetes started the container, then the container subsequently exited.

As we all know, the Docker container must hold and keep the PID 1 running in it otherwise the container exit (A container exit when the main process exit). In the case of Docker, the process which is having PID 1 is the main process and since it is not running, the docker container gets stopped. When the container gets stopped, the Kubernetes will try to restart it(as we have specified the spec.restartPolicy as “Always”, for more details, refer: Restart Policy).

After restarting it few times, the Kubernetes will declare the pod as “Back-off” state. However, the Kubernetes will keep on trying to restart the pod. Also, you will see the pod restart counter(kubectl get pods -n <YourNameSpace>) is increasing as Kubernetes keeps on restarting the container but the container keeps on exiting.

~ $ kubectl get pods -n test-kube
NAME                         READY   STATUS             RESTARTS   AGE
challenge-7b97fd8b7f-cdvh4   0/1     CrashLoopBackOff   2          60s

After checking the events of the pod, you will get the idea of why the pod is failing and going to CrashLoopBackOff state.

If you are unable to find many details or if you want to debug it more, please go to the next steps that I mentioned below.

  • Step 2: Check the logs of the pods:

To view the logs of the pod, execute: kubectl logs <podname> -n <YourNameSpace>.

~ $ kubectl logs challenge-7b97fd8b7f-cdvh4 -n test-kube
Sun Jun 28 14:51:02 UTC 2020
Hello from the Kubernetes cluster
exiting with status 0
~ $

From the above output, you can see that the pod is showing some output and then exiting (I did this output in the deployment file so that I can show you this demo).

However you have a real application, this could mean that your application is exiting for some reason and hopefully, the application logs will tell you or give a clue why your application is existing.

  • Step 3: Look at the Liveness/Readiness probe:

If you have configured liveness and readiness probes in the deployment file of your application then, you can also have a look into the Liveness/Readiness to get the reason for exiting the application. You can describe the pod to see Liveness and Readiness and you will get the reason for pod exiting in Liveness/Readiness event.

kubectl describe pod <podname> -n <YourNameSpace>.

We have discussed the troubleshooting steps in 3 sections and we came to know the reason for exiting the pod.

In this case, the pod is exiting because the main process is no more running in the container. As already stated, the Docker container must hold and keep the PID 1 running in it otherwise the container exits.

How to fix this issue?

We came to know the pod is failing because there is no process running in the container. We will have to add a task that will never finish and then we can keep our pod in running state (in this case). I will pass while true; do sleep 20; done; as an argument in the deployment file so that my container will keep on running.

  • My Deployment file before fixing the issue:
apiVersion: apps/v1
kind: Deployment
metadata:
  name:  challenge
  namespace: test-kube
  labels:
    name:  challenge
spec:
  replicas: 1
  selector:
    matchLabels:
      os: ubuntu
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        os: ubuntu
    spec:
      containers:
      - image:  ubuntu:latest
        imagePullPolicy: Always
        name:  my-name
        command: [ "/bin/bash", "-ec"]
        args: [ date; sleep 10; echo 'Hello from the Kubernetes cluster'; sleep 1; echo 'exiting with status 0'; exit 1;]
        resources:
          limits:
            cpu: "500m"
            memory: "500Mi"      
        ports:
        - containerPort:  22
          name:  my-name
      restartPolicy: Always
  • My Deployment file after fixing the issue:
apiVersion: apps/v1
kind: Deployment
metadata:
  name:  challenge
  namespace: test-kube
  labels:
    name:  challenge
spec:
  replicas: 1
  selector:
    matchLabels:
      os: ubuntu
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        os: ubuntu
    spec:
      containers:
      - image:  ubuntu:latest
        imagePullPolicy: Always
        name:  my-name
        command: [ "/bin/bash", "-ec"]
        args: [ date; sleep 10; echo 'Hello from the Kubernetes cluster'; sleep 1; while true; do sleep 20; done;]
        resources:
          limits:
            cpu: "500m"
            memory: "500Mi"      
        ports:
        - containerPort:  22
          name:  my-name
      restartPolicy: Always

I have added a task to keep the container running and now my pod is not failing or exiting.

~ $ kubectl get pods -n test-kube
NAME                        READY   STATUS    RESTARTS   AGE
challenge-5bdf6fc67-mnmhq   1/1     Running   0          2m26s
~ $ kubectl describe pod challenge-5bdf6fc67-mnmhq -n test-kube
Name:         challenge-5bdf6fc67-mnmhq
Namespace:    test-kube
Priority:     0
Node:         minikube/192.168.99.100
Start Time:   Sun, 28 Jun 2020 21:25:14 +0530
Labels:       os=ubuntu
              pod-template-hash=5bdf6fc67
Annotations:  <none>
Status:       Running
IP:           172.17.0.5
IPs:
  IP:           172.17.0.5
Controlled By:  ReplicaSet/challenge-5bdf6fc67
Containers:
  my-name:
    Container ID:  docker://3b6336b34604278c5cd7ed1dbce95e6de8f43254649a940ac5fe455efa5e98a9
    Image:         ubuntu:latest
    Image ID:      docker-pullable://ubuntu@sha256:35c4a2c15539c6c1e4e5fa4e554dac323ad0107d8eb5c582d6ff386b383b7dce
    Port:          22/TCP
    Host Port:     0/TCP
    Command:
      /bin/bash
      -ec
    Args:
      date; sleep 10; echo 'Hello from the Kubernetes cluster'; sleep 1; while true; do sleep 20; done;
    State:          Running
      Started:      Sun, 28 Jun 2020 21:25:21 +0530
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  500Mi
    Requests:
      cpu:        500m
      memory:     500Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-5sl7g (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-5sl7g:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-5sl7g
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age        From               Message
  ----    ------     ----       ----               -------
  Normal  Scheduled  <unknown>  default-scheduler  Successfully assigned test-kube/challenge-5bdf6fc67-mnmhq to minikube
  Normal  Pulling    2m14s      kubelet, minikube  Pulling image "ubuntu:latest"
  Normal  Pulled     2m8s       kubelet, minikube  Successfully pulled image "ubuntu:latest"
  Normal  Created    2m8s       kubelet, minikube  Created container my-name
  Normal  Started    2m8s       kubelet, minikube  Started container my-name

Conclusion:

Containers are meant to run for completion. If no process runs or the process completes its task, then the container will exist. The container should keep the PID 1 running in it in order to keep the container running.

In this post, we have gone through how to “Troubleshoot pod CrashLoopBackOff error”. Feel free to comment in case you still face issue in debugging this error or let us know if you face any issue in fixing the pod error. We will be happy to provide you with a solution for the same.

Powered by Facebook Comments

Be the first to comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.