Prometheus full - no data being displayed in Grafana

Knowledge Base - Solution

Wednesday, March 16, 2022

Problem:

NO SPACE LEFT ON DEVICE - note the pod numbers will not be same on your system k logs -f nci-service-monitoring-default-prometheus-758755bd9b-zw75v -n nci-service-monitoring-default level=warn ts=2022-03-14T15:34:34.862Z caller=manager.go:619 component="rule manager" group=elasticsearch-alert-rules msg="Rule sample appending failed" err="write to WAL: log samples: write /prometheus/wal/00000604: no space left on device" k exec -it -n nci-service-monitoring-default nci-service-monitoring-default-prometheus-5ff8b957d-6ps7p -- sh /prometheus $ df -h | grep rb /dev/rbd4 10G 9.7G 0G 98% /prometheus

Solution:

solution 1: log onto the pod and remove files in wal directory k exec -it -n nci-service-monitoring-default nci-service-monitoring-default-prometheus-5ff8b957d-6ps7p -- sh cd /prometheus/wal/ rm * if you cannot remove from within the pod then find out which server it is running on, in our example nciloader2 [protean@nciloader3 ~]$ k get pods -A -o wide | grep -i prometheus nci-service-monitoring-default nci-service-monitoring-default-prometheus-5ff8b957d-6ps7p 1/1 Running 0 24h 10.244.1.218 nciloader2 ssh onto that server run df to see which one is full df -h | grep rbd cd to its directory which you should see wall, cd to this directory and remove files. 2. you can try and resize the pod https://conf1.ds.jdsu.net/wiki/display/NCIR/Resize+the+RBD+PersistentVolume+in+Cluster?src=contextnavpagetreemode

Back to Knowledge Base