Immich, CUDA, and Unkillable Containers: When GPU Memory Won’t Let Go
Doctrine block Why Immich jobs on NVIDIA GPUs can leave Docker and ML workers in an unkillable state until reboot.
1. The scenario observed on Signal Raider
During deep Immich stress testing on Signal Raider, the following pattern emerged:
- Immich ML jobs run with GPU acceleration enabled (CLIP, faces, OCR, video analysis).
- After several heavy jobs, GPU starts throwing CUDA errors and OOMs.
- Attempting to stop Immich and Docker fails — containers refuse to die.
- Processes remain “alive” but stuck, holding onto GPU allocations that cannot be freed.
- Only a full system reboot restores normal operation and frees the GPU.
The key red flag: the system tried to kill Immich and Docker, but couldn’t, because the GPU memory was tied up in a state the driver could no longer unwind.
2. What’s actually happening under the hood
2.1 Processes enter D‑state (uninterruptible sleep)
When a process is blocked inside the kernel (for example, waiting on I/O or a GPU driver operation), it can enter a state known as D‑state (uninterruptible sleep). In this state:
- Signals are ignored – even
SIGKILLcannot terminate it. - Docker cannot stop the container because the kernel won’t reap the process.
- Resources are held (including GPU memory and file descriptors) until the blocking operation completes, which it never does in this failure mode.
2.2 GPU memory in a “wrong” address space
CUDA uses several types of allocations:
- Device memory on the GPU.
- Pinned host memory in system RAM.
- Unified / managed memory shared between CPU and GPU.
- Driver‑managed internal pools used by TensorRT and ONNX Runtime.
In this failure mode, some allocations end up tied to a GPU context that is half‑broken. From the OS perspective, memory is “allocated at the wrong address” in the sense that it’s tied to a context the driver can’t cleanly release anymore.
2.3 ONNX Runtime + TensorRT lifecycle
Immich’s ML worker uses ONNX Runtime with:
- CUDAExecutionProvider for GPU inference.
- TensorRT for engine optimizations.
Over multiple jobs, the following can happen:
- Large TensorRT workspaces are allocated and partially freed.
- Graph capture buffers are created during CUDA graph optimization.
- Model weights and engine caches remain resident between jobs.
If a job dies during one of these phases, it can leave behind stale allocations and broken state inside the driver.
2.4 Docker is not the boss of the GPU
Docker can only kill containers if:
- The kernel can deliver signals and terminate the processes.
- The processes can exit their kernel wait and release resources.
- The GPU driver is willing to tear down the CUDA context.
In this failure mode, the process is stuck in a GPU driver call that never completes, so neither Docker nor kill -9 can break it free.
3. Why Immich triggers this under repeated heavy workloads
Immich is designed as a CPU‑first app with optional GPU acceleration. The GPU path is fast, but its lifecycle is not deeply engineered for repeated, heavy ML jobs. Under stress testing, the following pattern emerges:
- First large job – GPU performs well, memory is clean, everything succeeds.
- Second/third job – ONNX Runtime and TensorRT reuse existing allocations and add more.
- Fourth/fifth heavy job – memory becomes fragmented, stale graph capture state exists, TensorRT workspaces linger.
- Eventually – the CUDA allocator hits a state where new allocations fail and old ones cannot be fully freed.
At that point:
- Immich’s ML worker may crash or hang.
- Processes become stuck in D‑state inside GPU driver calls.
- Docker cannot stop or kill the containers.
- The GPU context is effectively poisoned until reboot.
4. Recognizing the failure mode in practice
Signs that you’ve hit this GPU lifecycle failure mode:
- Immich logs show repeated CUDA errors and OOMs even with apparent free VRAM.
- Docker stop / kill on the Immich ML container hangs or fails.
kill -9on the ML process has no effect.nvidia-smishows processes that won’t go away.- System load may show tasks in D‑state (uninterruptible sleep).
Attempts to fix it with:
- Restarting the container → fails or leaves stuck processes.
- Restarting Docker → may help partially, but often leaves GPU processes behind.
- Resetting the GPU via
nvidia-smi --gpu-reset→ often fails if processes are stuck.
The only reliable fix: reboot the system, which fully resets the GPU driver and context.
5. Practical conclusions for Immich on NVIDIA GPUs
In practice, this leads to a few operational rules:
- Use GPU Immich for private/family archives with incremental uploads and small batches.
- Avoid repeated giant imports back‑to‑back on a single GPU node.
- Expect to reboot occasionally if you do serious torture‑testing or bulk migration.
- Don’t blame Docker when containers become unkillable — the real issue is inside the GPU driver and CUDA allocator.
Architecturally, this is not a reason to abandon Immich. It’s a reason to understand its limits: a powerful, GPU‑accelerated, private photo system that behaves beautifully under normal use, but shows deep ML stack cracks when pushed into repeated, datacenter‑style workloads.
نظری یافت نشد