Summary
Error
Cause
This issue caused by the data file of smarta was damaged. since the data of samrta was stored in the file system of NFS, so if there is a machine shutdown suddendly, the data might be break.
The solution is remove the broken data file, and restart the pod. The new data file will be generated during the restart.
Fix
Solution:
1.Check the deployment which Available is 0
kubectl get deployment -n itsma2
We could see 4 deployments have issue, they are:
smarta-smap-con-1b
smarta-smap-qms
smarta-ss-community
smarta-ss-con-1b
2. Scale deployment, set replicas to 0
kubectl scale deployment smarta-ss-con-1b --replicas=0 -n itsma2
3. Check the pods status
kubectl get pods --all-namespaces --show-all -o wide
grep smarta
We see the pod smarta-ss-con-1b is Terminating, wait until it terminated.
4. backup old content1b folder
cd /var/vols/itom/itsma/itsma-itsma1-smartanalytics/data/idol/ss/
mv content1b content1bbak
5. Scale deployment, set replicas to 1
kubectl scale deployment smarta-ss-con-1b --replicas=1 -n itsma2
6. restart pod smarta-ss-community
kubectl delete pod smarta-ss-community-* -n itsma2
7. Check the pods status
kubectl get pods --all-namespaces --show-all -o wide
grep smarta
The pod is smarta-ss-community-xxx is Running, but there are 2 pods the container is not up, they are
smarta-smap-con-1b-xxx
smarta-smap-qms-xxx
8. Scale above 2 deployments, set replicas to 0, pods would get terminated
kubectl scale deployment smarta-smap-con-1b --replicas=0 -n itsma2
kubectl scale deployment smarta-smap-qms --replicas=0 -n itsma2
9. backup old content1b folder
cd /var/vols/itom/itsma/itsma-itsma1-smartanalytics/data/idol/smsp/
mv content1b content1bbak
10. Scale above 2 deployment, set replicas to 1
kubectl scale deployment smarta-smap-con-1b --replicas=1 -n itsma2
kubectl scale deployment smarta-smap-qms --replicas=1 -n itsma2
11. Wait some time, all smarta pods are Running correctly.