From 38e5f02a001030cbb3bd237e1eba5af94442b17c Mon Sep 17 00:00:00 2001 From: Kpa-clawbot Date: Tue, 31 Mar 2026 16:00:31 -0700 Subject: [PATCH] ci: add Docker image cleanup to prevent runner disk exhaustion (#333) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Problem The self-hosted runner (`meshcore-runner-2`) filled its 29GB disk to 100%, blocking all CI runs: ``` Filesystem Size Used Avail Use% /dev/root 29G 29G 2.3M 100% Docker Images: 67 total, 2 active, 18.83GB reclaimable (99%) ``` Root cause: no Docker image cleanup after builds. Each CI run builds a new image but never prunes old ones. ## Fix ### 1. Docker image cleanup after deploy (`deploy` job) - Runs with `if: always()` so it executes even if deploy fails - `docker image prune -af --filter "until=24h"` — removes images older than 24h (safe: current build is minutes old) - `docker builder prune -f --keep-storage=1GB` — caps build cache - Logs before/after `docker system df` for visibility ### 2. Runner log cleanup at start of E2E job - Prunes runner diagnostic logs older than 3 days (was 53MB and growing) - Reports `df -h` for disk visibility in CI output ## Impact After manual cleanup today, disk went from 100% → 35% (19GB free). This PR prevents recurrence. ## Test plan - [x] Manual cleanup verified on runner via `az vm run-command` - [ ] Next CI run should show cleanup step output in deploy job logs Co-authored-by: Kpa-clawbot <259247574+Kpa-clawbot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/workflows/deploy.yml | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml index 9421076b..1cfffff7 100644 --- a/.github/workflows/deploy.yml +++ b/.github/workflows/deploy.yml @@ -129,6 +129,13 @@ jobs: with: fetch-depth: 0 + - name: Free disk space + run: | + # Prune old runner diagnostic logs (can accumulate 50MB+) + find ~/actions-runner/_diag/ -name '*.log' -mtime +3 -delete 2>/dev/null || true + # Show available disk space + df -h / | tail -1 + - name: Set up Node.js 22 uses: actions/setup-node@v5 with: @@ -314,6 +321,17 @@ jobs: exit 1 fi + - name: Clean up old Docker images + if: always() + run: | + # Remove dangling images and images older than 24h (keeps current build) + echo "--- Docker disk usage before cleanup ---" + docker system df + docker image prune -af --filter "until=24h" 2>/dev/null || true + docker builder prune -f --keep-storage=1GB 2>/dev/null || true + echo "--- Docker disk usage after cleanup ---" + docker system df + # ─────────────────────────────────────────────────────────────── # 5. Publish Badges & Summary (master only) # ───────────────────────────────────────────────────────────────