Hello Friends, Hope you are doing great. Many of you have already come across such a situation where you are creating a pod but it goes to “CrashLoopBackOff” state and you find difficulty in fixing it. Please note that your pod can fail in all kinds of ways. One failure status/state is CrashLoopBackOff
. In this post, I will explain how to “Troubleshoot pod CrashLoopBackOff error” in detail so that you can resolve this error whenever you face such issue in future.
Note: I would also suggest you read this thoroughly so that you can understand each and every step clearly.
What is Kuberenetes CrashLoopBackOff Error?
A CrashloopBackOff means that you have a pod which is starting, crashing, and again starting, and then crashing. If you are a beginner on Kubernetes, I would suggest you read the post: Introduction to Kubernetes World
Please note that failed containers are restarted by the Kubelet and is restarted with an exponentially back-off delay (10s, 20, 40s, and so on) capped at 5 minutes and is reset after 10 minutes of successful execution. You can refer the below example for restartPolicy
of the pod. Reference: podRestartPolicy
PodSpec
has a restartPolicy
field. Its values are Always
, OnFailure
, and Never
which applies to all containers in a pod. The default value for restartPolicy
is “Always
“. Please refer the below config to understand the point that I just said.
apiVersion: v1 kind: Pod metadata: name: nginx namespace: test-kube labels: name: myPod type: proxy spec: containers: - name: nginxcontainer image: nginx:latest resources: limits: memory: "128Mi" cpu: "500m" ports: - containerPort: 80 restartPolicy: Always
Why does a CrashLoopBackOff occur?
There are a number of different reasons for CrashLoopBackOff. Here are a few reasons:
- The application inside the container keeps crashing.
- Some of the parameters of the pod/container have been configured incorrectly.
- An error has been made while deploying Kubernetes pod, etc.
How do I see if my pod is having CrashLoopBackOff issue?
You usually see this when you run your standard command kubectl get pods
(You need to use namespace when your pod is under separate/dedicated namespace, then the command will be kubectl get pods -n <YourNameSpace>
).
~ $ kubectl get pods -n test-kube NAME READY STATUS RESTARTS AGE challenge-7b97fd8b7f-cdvh4 0/1 CrashLoopBackOff 2 60s
Note: test-kube is my namespace.
Trobleshooting steps:
Troubleshoot pod CrashLoopBackOff error
- Step 1: Describe the pod to get more information on this.
Doing kubectl describe pod <podname>
(in case of a dedicated namespace, execute: kubectl describe pod <podname> -n <YourNameSpace>
) will give you more information on the pod.
~ $ kubectl describe pod challenge-7b97fd8b7f-cdvh4 -n test-kube Name: challenge-7b97fd8b7f-cdvh4 Namespace: test-kube Priority: 0 Node: minikube/192.168.99.100 Start Time: Sun, 28 Jun 2020 20:25:14 +0530 Labels: os=ubuntu pod-template-hash=7b97fd8b7f Annotations: <none> Status: Running IP: 172.17.0.4 IPs: IP: 172.17.0.4 Controlled By: ReplicaSet/challenge-7b97fd8b7f Containers: my-name: Container ID: docker://4d397634b294992f80067083933cb37f00da27df3674f4ba383f5d882d9bfc3e Image: ubuntu:latest Image ID: docker-pullable://ubuntu@sha256:747d2dbbaaee995098c9792d99bd333c6783ce56150d1b11e333bbceed5c54d7 Port: 22/TCP Host Port: 0/TCP State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Completed Exit Code: 0 Started: Sun, 28 Jun 2020 20:25:14 +0530 Finished: Sun, 28 Jun 2020 20:25:23 +0530 Ready: False Restart Count: 8 Limits: cpu: 500m memory: 500Mi Requests: cpu: 500m memory: 500Mi Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-5sl7g (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: default-token-5sl7g: Type: Secret (a volume populated by a Secret) SecretName: default-token-5sl7g Optional: false QoS Class: Guaranteed Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned test-kube/challenge-7b97fd8b7f-cdvh4 to minikube Normal Pulled 19m (x4 over 20m) kubelet, minikube Successfully pulled image "ubuntu:latest" Normal Created 19m (x4 over 20m) kubelet, minikube Created container my-name Normal Started 19m (x4 over 20m) kubelet, minikube Started container my-name Normal Pulling 18m (x5 over 20m) kubelet, minikube Pulling image "ubuntu:latest" Warning BackOff 3s (x93 over 20m) kubelet, minikube Back-off restarting failed container ~ $
We got the description of the pod “challenge-7b97fd8b7f-cdvh4
“. The first thing, that we should focus on this output is the “Events
“. This will tell you what Kubernetes is doing here. Reading the “Events
” section from top to bottom tells us:
- the pod was assigned to a node (
Successfully assigned test-kube/challenge-7b97fd8b7f-cdvh4 to minikube
) - starts pulling the image (
Successfully pulled image "ubuntu:latest"
) - starting the image in the form of creating a container (
Created container my-name
) - starting the container (
Started container my-name
) - it goes to “Back-off” state (
Back-off restarting failed container
).
Further, you can use the verbose option in describe command to see more details description of the pod and you will get the exact reason or clue why the pod is exiting. Like: kubectl describe pod <podname> -n test-kube -v=9
For more details on this, you can refer the Kubectl output verbosity and debugging.
Bonus Tip: You can also use a magic command kubectl get events -n test-kube
to get the events directly for a pod.
~ $ kubectl get events -n test-kube LAST SEEN TYPE REASON OBJECT MESSAGE 9m39s Normal Pulling pod/challenge-7b97fd8b7f-cdvh4 Pulling image "ubuntu:latest" 4m38s Warning BackOff pod/challenge-7b97fd8b7f-cdvh4 Back-off restarting failed container
The message says that the pod is in Back-off restarting failed container
. This is most likely means that Kubernetes started the container, then the container subsequently exited.
As we all know, the Docker container must hold and keep the PID 1 running in it otherwise the container exit (A container exit when the main process exit). In the case of Docker, the process which is having PID 1 is the main process and since it is not running, the docker container gets stopped. When the container gets stopped, the Kubernetes will try to restart it(as we have specified the spec.restartPolicy as “Always”, for more details, refer: Restart Policy).
After restarting it few times, the Kubernetes will declare the pod as “Back-off
” state. However, the Kubernetes will keep on trying to restart the pod. Also, you will see the pod restart counter(kubectl get pods -n <YourNameSpace>
) is increasing as Kubernetes keeps on restarting the container but the container keeps on exiting.
~ $ kubectl get pods -n test-kube NAME READY STATUS RESTARTS AGE challenge-7b97fd8b7f-cdvh4 0/1 CrashLoopBackOff 2 60s
After checking the events of the pod, you will get the idea of why the pod is failing and going to CrashLoopBackOff
state.
If you are unable to find many details or if you want to debug it more, please go to the next steps that I mentioned below.
- Step 2: Check the logs of the pods:
To view the logs of the pod, execute: kubectl logs <podname> -n <YourNameSpace>
.
~ $ kubectl logs challenge-7b97fd8b7f-cdvh4 -n test-kube Sun Jun 28 14:51:02 UTC 2020 Hello from the Kubernetes cluster exiting with status 0 ~ $
From the above output, you can see that the pod is showing some output and then exiting (I did this output in the deployment file so that I can show you this demo).
However you have a real application, this could mean that your application is exiting for some reason and hopefully, the application logs will tell you or give a clue why your application is existing.
- Step 3: Look at the Liveness/Readiness probe:
If you have configured liveness and readiness probes in the deployment file of your application then, you can also have a look into the Liveness/Readiness to get the reason for exiting the application. You can describe the pod to see Liveness and Readiness and you will get the reason for pod exiting in Liveness/Readiness event.
kubectl describe pod <podname> -n <YourNameSpace>
.
We have discussed the troubleshooting steps in 3 sections and we came to know the reason for exiting the pod.
In this case, the pod is exiting because the main process is no more running in the container. As already stated, the Docker container must hold and keep the PID 1 running in it otherwise the container exits.
How to fix this issue?
We came to know the pod is failing because there is no process running in the container. We will have to add a task that will never finish and then we can keep our pod in running state (in this case). I will pass while true; do sleep 20; done;
as an argument in the deployment file so that my container will keep on running.
- My Deployment file before fixing the issue:
apiVersion: apps/v1 kind: Deployment metadata: name: challenge namespace: test-kube labels: name: challenge spec: replicas: 1 selector: matchLabels: os: ubuntu strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 1 type: RollingUpdate template: metadata: labels: os: ubuntu spec: containers: - image: ubuntu:latest imagePullPolicy: Always name: my-name command: [ "/bin/bash", "-ec"] args: [ date; sleep 10; echo 'Hello from the Kubernetes cluster'; sleep 1; echo 'exiting with status 0'; exit 1;] resources: limits: cpu: "500m" memory: "500Mi" ports: - containerPort: 22 name: my-name restartPolicy: Always
- My Deployment file after fixing the issue:
apiVersion: apps/v1 kind: Deployment metadata: name: challenge namespace: test-kube labels: name: challenge spec: replicas: 1 selector: matchLabels: os: ubuntu strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 1 type: RollingUpdate template: metadata: labels: os: ubuntu spec: containers: - image: ubuntu:latest imagePullPolicy: Always name: my-name command: [ "/bin/bash", "-ec"] args: [ date; sleep 10; echo 'Hello from the Kubernetes cluster'; sleep 1; while true; do sleep 20; done;] resources: limits: cpu: "500m" memory: "500Mi" ports: - containerPort: 22 name: my-name restartPolicy: Always
I have added a task to keep the container running and now my pod is not failing or exiting.
~ $ kubectl get pods -n test-kube NAME READY STATUS RESTARTS AGE challenge-5bdf6fc67-mnmhq 1/1 Running 0 2m26s
~ $ kubectl describe pod challenge-5bdf6fc67-mnmhq -n test-kube Name: challenge-5bdf6fc67-mnmhq Namespace: test-kube Priority: 0 Node: minikube/192.168.99.100 Start Time: Sun, 28 Jun 2020 21:25:14 +0530 Labels: os=ubuntu pod-template-hash=5bdf6fc67 Annotations: <none> Status: Running IP: 172.17.0.5 IPs: IP: 172.17.0.5 Controlled By: ReplicaSet/challenge-5bdf6fc67 Containers: my-name: Container ID: docker://3b6336b34604278c5cd7ed1dbce95e6de8f43254649a940ac5fe455efa5e98a9 Image: ubuntu:latest Image ID: docker-pullable://ubuntu@sha256:35c4a2c15539c6c1e4e5fa4e554dac323ad0107d8eb5c582d6ff386b383b7dce Port: 22/TCP Host Port: 0/TCP Command: /bin/bash -ec Args: date; sleep 10; echo 'Hello from the Kubernetes cluster'; sleep 1; while true; do sleep 20; done; State: Running Started: Sun, 28 Jun 2020 21:25:21 +0530 Ready: True Restart Count: 0 Limits: cpu: 500m memory: 500Mi Requests: cpu: 500m memory: 500Mi Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-5sl7g (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: default-token-5sl7g: Type: Secret (a volume populated by a Secret) SecretName: default-token-5sl7g Optional: false QoS Class: Guaranteed Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned test-kube/challenge-5bdf6fc67-mnmhq to minikube Normal Pulling 2m14s kubelet, minikube Pulling image "ubuntu:latest" Normal Pulled 2m8s kubelet, minikube Successfully pulled image "ubuntu:latest" Normal Created 2m8s kubelet, minikube Created container my-name Normal Started 2m8s kubelet, minikube Started container my-name
Conclusion:
Containers are meant to run for completion. If no process runs or the process completes its task, then the container will exist. The container should keep the PID 1 running in it in order to keep the container running.
In this post, we have gone through how to “Troubleshoot pod CrashLoopBackOff error”. Feel free to comment in case you still face issue in debugging this error or let us know if you face any issue in fixing the pod error. We will be happy to provide you with a solution for the same.
My name is Shashank Shekhar. I am a DevOps Engineer, currently working in one of the best companies in India. I am having around 5 years of experience in Linux Server Administration and DevOps tools.
I love to work in Linux environment & love learning new things.
Powered by Facebook Comments
Leave a Reply