ComfyUI Black Image Output? Fix NaN Latents, VAE Precision, and the GTX 16-Series Trap (2026)

comfyuistable-diffusiontroubleshootingvaelocal-ai

A black image out of ComfyUI is one of the most frustrating failures because nothing crashes. The workflow turns green, the progress bar fills, the preview is solid black, and the terminal usually shows a single quiet line:

RuntimeWarning: invalid value encountered in cast

That line is the whole story. Your KSampler produced NaN (not-a-number) values in the latent, and when ComfyUI cast those to an 8-bit image, every pixel collapsed to zero — black. Sometimes it’s gray or a green wash instead, but the root cause is the same family of problems, and almost all of them are precision-related.

There are only about five real causes, and each has a specific fix. This walks through them in the order you should check, fastest first.

What you’ll be able to do after this:

  • Read the one terminal warning that proves it’s a NaN problem (not a bad prompt or model)
  • Apply the right precision flag (--fp32-vae, --cpu-vae, --force-fp32) for your exact card and model
  • Know when the fix is a 200MB VAE download versus a one-word launch flag

If your problem is ComfyUI failing to start, a red node, or “Torch not compiled with CUDA enabled,” that’s a different class of bug — see ComfyUI “Torch not compiled with CUDA enabled” and ComfyUI custom node “IMPORT FAILED”. This article is only for the case where generation runs but the output is black, gray, or green.

Step 0: Confirm it’s actually a NaN problem

Before changing anything, look at the terminal (the console window ComfyUI runs in, not the browser). You’re looking for one of these:

  • RuntimeWarning: invalid value encountered in cast
  • A message mentioning NaN in the latent or “tensor contains NaN/Inf”
  • No error at all, just a silent black PNG saved to output/

If you see invalid value encountered in cast, the KSampler output was already corrupt before the VAE decoded it — the latent itself is full of NaNs. That points the finger at precision in the diffusion model or VAE, which is exactly what the fixes below address. ComfyUI’s own VAE flag help text says the quiet part out loud: --fp16-vae is documented as “Run the VAE in fp16, might cause black images.” Half precision is the usual culprit.

If instead you see a real Python traceback or a red node, stop here — that’s a different problem (out of memory, a bad model file, a broken custom node).

Fix 1: The SDXL fp16 VAE NaN bug (the classic one)

This is the single most common cause for anyone running SDXL or SDXL-based checkpoints. The original SDXL VAE that Stability shipped numerically overflows in fp16: certain activations exceed the fp16 range, become Inf, then NaN, and you get a black image. It’s not your GPU — the VAE itself is broken in half precision.

Three ways to fix it, in order of preference:

Option A — swap in the fp16-safe VAE (best). Download sdxl-vae-fp16-fix, a version of the SDXL VAE retrained to stay inside fp16 range without producing NaNs. Drop sdxl_vae_fp16_fix.safetensors into ComfyUI/models/vae/, add a Load VAE node, point it at that file, and wire it into your VAE Decode. This keeps the speed and low VRAM of fp16 while killing the black images. It’s built on the SDXL 0.9 VAE and works with SDXL 0.9 and 1.0 checkpoints.

Option B — force the VAE to fp32. Launch with:

python main.py --fp32-vae

This runs the VAE in full precision. No NaNs, slightly more VRAM and a little slower on decode, but it always works. On a 24GB card the cost is negligible.

Option C — run the VAE on CPU. If you’re VRAM-starved and fp32 VAE tips you into an out-of-memory error:

python main.py --cpu-vae

Community reports are consistent that running the VAE on CPU reliably eliminates the SDXL black-image issue. It’s slow — decode can take several seconds instead of a fraction of one — but it’s bulletproof when nothing else fits. If --cpu-vae then throws an OOM on system RAM, see CUDA out of memory fixes.

For most SDXL users, Option A is the right answer: it’s a one-time download and you never think about it again.

Fix 2: Wrong VAE for the model

If there’s no Load VAE node in your workflow, ComfyUI uses whatever VAE is baked into the checkpoint — and if that’s missing or mismatched, you get black or garbage output. The mismatch that bites people most: using an SD 1.5 VAE with an SDXL model (or vice versa). They are not interchangeable. An SD 1.5 VAE on SDXL produces black or noise-filled images every time.

Fix: add an explicit Load VAE node and point it at the correct file:

  • SDXL / SDXL checkpoints → sdxl_vae.safetensors (or the fp16-fix version from Fix 1)
  • SD 1.5 checkpoints → the SD 1.5 VAE (vae-ft-mse-840000-ema-pruned.safetensors)
  • FLUX → the FLUX VAE (ae.safetensors)

Make sure the file actually exists in ComfyUI/models/vae/. An empty or zero-byte download will also decode to black.

Fix 3: The GTX 16-series fp16 trap

This one is hardware, not software. The GTX 1650 / 1660 / 1660 Ti / 1660 Super have a known defect in fp16 support — they’ll produce an all-black or all-green image in half precision unless you force full precision. ComfyUI maintainers and a long-running issue thread (#884) confirm the behavior: on a 16-series card, fp16 is unreliable.

The fix is a single flag:

python main.py --force-fp32

One user reported this “eliminated 99% of black images and crashes” on a GTX 16-series setup. It sets the UNet, VAE, and text encoder all to fp32 — the most accurate and slowest path — but on these cards it’s the only stable option. Expect generation to be noticeably slower; the 16-series was never a fast inference card to begin with.

If you’re hitting this, it’s worth being honest about the hardware: a card with broken fp16 in 2026 is a hard floor on your local AI experience. A used RTX 3090 (24GB, ~936 GB/s, proper fp16/bf16) or a new RTX 5060 Ti 16GB sidesteps every fp16 precision bug in this article and runs FLUX and SDXL at sane speeds. See the GPU buying guide if you’re weighing an upgrade.

Fix 4: Extreme CFG and bad sampler settings

NaNs don’t only come from precision. Pushing the CFG scale far outside its sane range can blow up the math too. CFG above ~15 (and sometimes below ~2) can drive values out of range and produce NaN latents, especially on models tuned for low CFG.

Check your KSampler:

  • Standard SDXL / SD 1.5: CFG around 6–8 is the safe zone. If you’ve cranked it to 20 chasing “more prompt adherence,” pull it back to 7 and test.
  • Turbo / Lightning / distilled models: these want CFG 1–2. Running a Turbo model at CFG 7 can give you fried or NaN output.
  • FLUX: uses its own guidance node, not classic CFG — leave the KSampler CFG at 1.0 and use the FluxGuidance node instead. See FLUX.1 Kontext Dev local setup for the correct FLUX graph.

Also sanity-check that you didn’t accidentally set steps to 1 or pick a sampler/scheduler combo the model wasn’t trained for. A mismatched scheduler can produce noise that looks like a “failed” black-ish image.

Fix 5: The new-model NaN bug (Z-Image and friends, 2026)

Newer architectures sometimes ship before their fp16/bf16 paths are stable in ComfyUI. The clearest 2026 example: Z-Image Base (Tongyi-MAI). Multiple reports — including ComfyUI issue #13123 (filed March 2026) and the upstream Z-Image issue #14 — show that Z-Image Base outputs silent black images regardless of precision, throwing the same invalid value encountered in cast warning, while Z-Image Turbo works perfectly in the identical workflow. Running the model in torch.float16 produces pure black (NaN latents); the root cause is fp16 handling in the UNet/scheduler, not the user’s setup, and it reproduced even on an RX 7900 XTX via ROCm/ZLUDA with custom nodes disabled.

When a brand-new model gives you black output:

  1. Try the Turbo / distilled variant if one exists — it’s often the only one with a stable fp16 path at launch.
  2. Force bf16 or fp32 on the UNet with --bf16-unet or --fp32-unet. bf16 has a much wider exponent range than fp16 and survives the overflows that kill fp16. On RTX 30/40/50-series cards bf16 is nearly free.
  3. Update ComfyUI — these are upstream bugs that get patched. Pull the latest, since the model-specific fix often lands within days. (Updating can also break custom nodes — if a node goes red after, see the IMPORT FAILED guide.)
  4. Wait, or use a GGUF/quantized build that the community has already validated.

This is also why a “factory reset” doesn’t fix it (as in issue #13116, where black/gray and NaN latents persisted across models on both Windows and Linux after a clean reinstall): if the bug is in the precision path for a specific architecture, reinstalling ComfyUI changes nothing. The flag or the model variant is the lever, not the install.

Quick diagnostic table

SymptomMost likely causeFirst thing to try
Black SDXL images, invalid value in castfp16 VAE NaN bugsdxl-vae-fp16-fix VAE, or --fp32-vae
Black or green on GTX 1650/166016-series fp16 defect--force-fp32
Black/garbage, no Load VAE nodeWrong or missing VAEAdd Load VAE, match it to the model
Fried/black at high CFGNaN from extreme CFGDrop CFG to 6–8 (1–2 for Turbo)
New model black, Turbo variant worksUpstream fp16 bug--bf16-unet, update ComfyUI
Black only when VRAM-tightVAE OOM falling back badly--cpu-vae

The order I actually check things

When a previously-working setup starts producing black images, the fastest path is:

  1. Read the terminal. Confirm invalid value encountered in cast or a NaN message. No NaN message → it’s not this problem.
  2. What model? SDXL → it’s almost certainly Fix 1. Brand-new architecture → Fix 5. SD 1.5 that worked yesterday → check you didn’t change CFG or VAE.
  3. What card? GTX 16-series → Fix 3, immediately. Everything else → precision flag on VAE first.
  4. Apply one flag at a time and restart the server. ComfyUI reads launch flags at startup, so a flag change means a full restart of the ComfyUI process — refreshing the browser tab does nothing.

The mistake that wastes the most time is changing five things at once and restarting once. Change one flag, restart, generate a test image, then move on. Precision bugs are deterministic — the right flag fixes it on the first try.

A note on bf16 vs fp32

If you have the choice, bf16 is usually the better fix than fp32. bf16 (--bf16-vae, --bf16-unet) keeps the wide numeric range that prevents NaN overflow but uses half the memory and runs faster than fp32. Every RTX 30-, 40-, and 50-series card supports it natively. Reach for fp32 (--force-fp32, --fp32-vae) only when bf16 still misbehaves or on older cards (GTX 16-series, some Pascal) where bf16 support is weak or absent. For the inverse problem — when you want maximum speed and lowest VRAM with the right low-precision format — see ComfyUI NVFP4 on RTX 50-series.

FAQ

Is a black image ever caused by a bad prompt? No. A bad prompt gives you a wrong image, not a black one. Black/gray/green output is a numerical failure (NaN/Inf in the latent), not a content problem. Check precision, not your wording.

Why does the preview look fine but the final image is black? ComfyUI’s live preview often uses a fast latent-to-RGB approximation (latent2rgb / TAESD), which can render something while the real VAE decode produces NaNs. The preview and the final decode are different code paths — trust the saved PNG, not the preview.

Does --cpu-vae slow everything down? Only the VAE decode step, not sampling. On most workflows that’s the last few seconds. It’s the safest universal fix when you can’t tell which precision flag you need, at the cost of a slower decode.

I’m on a Mac (Apple Silicon) and getting black images. Mac/MPS has its own fp16 quirks with some models. Try --fp32-vae first, and --force-fp32 if that doesn’t do it. The trade-off is speed, but MPS correctness beats a fast black square.

I added the fp16-fix VAE and it’s still black. Then your problem isn’t the VAE — it’s the UNet producing NaNs upstream (Fix 3, 4, or 5). The fp16-fix VAE only solves the VAE overflow. If the latent is already NaN before decode, you need a precision flag on the UNet or a CFG fix.

Where do I put launch flags on the Windows portable build? Edit run_nvidia_gpu.bat and add the flag after main.py, e.g. .\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --fp32-vae. Save and run the .bat again.

For the broader open-source image-generation toolchain and model licensing, aifoss.dev tracks the FOSS side; for AI coding tools rather than image gen, see aicoderscope.com.

Only listed because they’re referenced above as the clean way out of fp16 precision bugs:

  • RTX 3090 — 24GB, proper fp16/bf16, the value pick for local image gen
  • RTX 5060 Ti 16GB — current-gen 16GB card with no fp16 defects

Sources

Last updated June 22, 2026. ComfyUI flags and model-specific bugs change between versions; verify against the current build before relying on a workaround.

Was this article helpful?