Flux vs SDXL vs SD 1.5: Real Cost-per-Image Across GPUs (2026)
Three generations of image models now live in a typical ComfyUI installation (Windows users: see our ComfyUI Windows setup guide), and the choice between them isn’t obvious. SD 1.5 still commands the deepest fine-tune ecosystem ever built around a single model. SDXL is the default backbone for most home-lab artists. Flux.1 produces images that read as professional photography — handles human hands, readable in-image text, and complex lighting in ways that SD and SDXL can’t reliably match.
The tradeoff is hardware. Flux requires 12–24 GB VRAM and takes 4–10× longer per image than SDXL on the same GPU. Whether that matters depends on how many images you generate per session and what GPU you’re running. This article quantifies those costs: verified generation times across two GPU tiers, converted into dollar-per-image electricity costs at the current US average of $0.182/kWh (EIA 2026 forecast), and a cloud comparison that tells you when a $30/month Midjourney subscription is still the smarter call.
The Three Models at a Glance
| Model | Architecture | Parameters | Native Resolution | VRAM (FP16) | VRAM (FP8/GGUF) |
|---|---|---|---|---|---|
| SD 1.5 | U-Net | 860M | 512×512 | 4–6 GB | — |
| SDXL 1.0 | U-Net (dual) | 3.5B | 1024×1024 | 8–12 GB | — |
| Flux.1 Dev | DiT transformer | 12B | 1024×1024 | ~24 GB | 12–14 GB |
| Flux.1 Schnell | DiT transformer | 12B | 1024×1024 | ~24 GB | 12–14 GB |
SD 1.5 and SDXL use U-Net architectures — compact, fast, designed for iterative denoising. Flux uses a Diffusion Transformer (DiT) architecture at 12 billion parameters. The quality jump is observable and consistent: Flux renders legible text in generated images, renders human anatomy with significantly fewer errors, and handles complex multi-element compositions more coherently. SDXL cannot do any of these reliably.
Dev vs Schnell: Flux.1 Schnell uses knowledge distillation to produce usable images in 4 steps instead of the 20+ steps Flux Dev requires. Schnell is Apache 2.0 licensed; Dev carries a non-commercial research restriction. For personal home use, either is legally fine. Schnell is faster, but most users running quality-critical work prefer Dev at 20 steps for the added detail — especially for photorealistic subjects.
Raw Speed: Verified Benchmarks
The SDXL numbers below come from ComfyUI’s public benchmark thread (Discussion #2970), which aggregated community-submitted hardware results for SDXL 1.0 at 1024×1024, 20 steps in ComfyUI. The Flux.1 Dev numbers come from ComfyUI Discussion #4571 (RTX 4090 Flux benchmarks, multiple contributors). SD 1.5 timings are derived from Automatic1111 community benchmarks; the 4090 vs 3090 ratio is confirmed by Tom’s Hardware testing.
SDXL at 1024×1024, 20 steps
| GPU | it/s | Sec/Image |
|---|---|---|
| RTX 3070 8 GB | 2.26 | 8.8 s |
| RTX 3090 24 GB | 3.61 | 5.5 s |
| RTX 4080 16 GB | 3.53 | 5.7 s |
| RTX 4090 24 GB | 7.61 | 2.6 s |
The RTX 3090 and RTX 4080 16GB land within 3% of each other on SDXL — roughly equal inference speed despite the VRAM difference. The RTX 4090 pulls ~2× ahead.
Flux.1 Dev at 1024×1024, 20 steps
| GPU | Precision | Sec/Image |
|---|---|---|
| RTX 4090 | FP8 + --fast | 9–10 s |
| RTX 4090 | Q8 GGUF | 15–17 s |
| RTX 4090 | FP16 full | 18–41 s |
| RTX 3090 | FP8 | ~14–18 s |
The FP16 time for RTX 4090 varies widely (18–41 s) depending on whether torch.compile is active and whether the VRAM pressure forces any CPU offloading. FP8 with --fast is the practical default on 24 GB cards — it fits cleanly, the quality delta from FP16 is undetectable at normal viewing distances, and the 9–10 second generation time is genuinely workflow-usable.
The RTX 3090 FP8 estimate (~14–18 s) is derived from community reports of the 3090 running approximately 40–45% slower than the 4090 per iteration, consistent with multiple benchmark sources.
Flux.1 Schnell at 1024×1024, 4 steps
| GPU | Precision | Sec/Image |
|---|---|---|
| RTX 4090 | FP8 | ~4–5 s |
| RTX 3090 | FP8 | ~6–8 s |
Schnell at 4 steps is competitive with SDXL at 20 steps in pure generation time on the RTX 4090. Quality isn’t SDXL-equivalent — it’s better in photorealism, weaker in fine-grained compositional control where SDXL’s ecosystem of refined samplers and CFG schedules still has an edge. For prompt-exploration workflows where you’re running 50+ generations to find the right composition, Schnell makes Flux economically viable on a 3090.
SD 1.5 at 512×512, 50 steps
| GPU | it/s | Sec/Image |
|---|---|---|
| RTX 4090 | ~37.6 | ~1.3 s |
| RTX 3090 | ~18.8 | ~2.7 s |
SD 1.5’s native resolution is 512×512. At that resolution and 50 steps, the RTX 4090 generates roughly 46 images per minute. The gap over SDXL and Flux in raw throughput is dramatic. For workflows that require hundreds of iterations — LoRA testing, prompt engineering sessions, batch rendering concept grids — SD 1.5’s speed advantage is real and meaningful.
The Electricity Math
At $0.182/kWh (US residential average, EIA 2026 forecast) and official NVIDIA TDPs (RTX 4090: 450W, RTX 3090: 350W):
Formula: cost = (seconds/image × 1000 images ÷ 3600) × (TDP_kW) × ($/kWh)
| Model | GPU | Sec/Image | TDP | Cost / 1,000 Images |
|---|---|---|---|---|
| SD 1.5 | RTX 4090 (450W) | 1.3 s | 450W | $0.030 |
| SD 1.5 | RTX 3090 (350W) | 2.7 s | 350W | $0.048 |
| SDXL | RTX 4090 (450W) | 2.6 s | 450W | $0.060 |
| SDXL | RTX 3090 (350W) | 5.5 s | 350W | $0.097 |
| Flux Schnell | RTX 4090 (450W) | 4.5 s | 450W | $0.102 |
| Flux Schnell | RTX 3090 (350W) | 7.0 s | 350W | $0.124 |
| Flux Dev (FP8) | RTX 4090 (450W) | 10 s | 450W | $0.228 |
| Flux Dev (FP8) | RTX 3090 (350W) | 16 s | 350W | $0.284 |
Three things stand out:
1. Electricity is not the cost driver — hardware is. Even running Flux Dev on an RTX 3090 at full throughput 24/7 for a month produces roughly 162,000 images and costs about $46 in electricity. The GPU purchase is always the dominant number.
2. Flux Schnell on an RTX 4090 costs roughly the same electricity-per-image as SDXL on an RTX 3090. The 4090 generates Schnell images nearly twice as fast, which largely cancels out its higher TDP.
3. The gap from SDXL to Flux Dev is real. At 10 seconds per image versus 2.6 seconds, Flux Dev takes 3.8× longer on the same 4090, which translates to 3.8× the electricity cost. For 10,000 images monthly, that’s $2.28 vs $0.60 in electricity — not consequential on its own, but multiply by years and it adds up.
VRAM Tiers: What You Can Actually Run
The VRAM question isn’t just about whether a model fits — it’s about whether it fits at a speed that matches your workflow.
12 GB cards (RTX 3060 12GB, RTX 4060 Ti 12GB): SD 1.5 at full speed. SDXL runs but benefits from 16 GB headroom, especially with ControlNet or a refiner loaded simultaneously. Flux requires GGUF Q5 or lower quantization and will use CPU offloading for the text encoders — expect 30–60 seconds per image. Usable for final production renders, impractical for iterative workflows.
ComfyUI’s Dynamic VRAM system (released March 2026) improved the 12 GB Flux experience by reducing peak RAM pressure, but it doesn’t change the fundamental compute bottleneck. The 3060 12GB is still a solid SDXL card — it’s a slow Flux card.
16 GB cards (RTX 4060 Ti 16GB, RTX 4080 16GB): SDXL runs comfortably at any sampler setting. Flux FP8 fits at 12–14 GB so 16 GB gives clear headroom without offloading. Flux Dev at 20 steps will land around 12–15 seconds per image on an RTX 4060 Ti 16GB (based on relative bandwidth: 4060 Ti 16GB at 288 GB/s vs 4090 at 1,008 GB/s).
24 GB cards (RTX 3090, RTX 4090): Full Flux FP8 without compromise. The 3090 is the minimum card where Flux Dev becomes a genuine production tool rather than an experiment.
Per-Use-Case Decision Matrix
| Use Case | Recommended Model | Why |
|---|---|---|
| Rapid iteration / prompt exploration | SD 1.5 | 46 images/min on RTX 4090. Cheapest electricity per image by 3–10×. |
| LoRA-heavy character work | SD 1.5 or SDXL | SD 1.5 has deepest fine-tune library; SDXL for native 1024px quality |
| Photorealistic portraits / scenes | Flux.1 Dev or Schnell | Hands, eyes, lighting depth that SDXL can’t reach |
| Text inside images (logos, signs, covers) | Flux.1 only | SD 1.5 and SDXL cannot produce reliably readable in-image text |
| Batch production (1,000+ images) | SD 1.5 or SDXL | Volume workflows need the speed advantage; Flux electricity cost multiplies |
| 12 GB VRAM | SD 1.5, SDXL FP16 | Flux works via GGUF but too slow for iteration |
| 24 GB GPU | SDXL for speed, Flux Dev for quality | No compromises on either |
Local vs Cloud: When Does Midjourney Win?
Midjourney Standard in 2026 costs $30/month for approximately 900 fast-mode images — roughly $0.033 per image. For light users, that pricing is hard to beat.
The break-even against a used RTX 3090 running SDXL:
| Monthly Volume | Midjourney Cost | RTX 3090 Monthly Cost* | Winner |
|---|---|---|---|
| 300 images/mo | $30 | ~$19 (hw amort.) + $0.29 elec. = $19.29 | Local (slight) |
| 500 images/mo | $30 | ~$19 + $0.49 = $19.49 | Local |
| 1,000 images/mo | $33 (fast hours) | ~$19 + $0.97 = $19.97 | Local |
| 5,000 images/mo | ~$55 (Pro plan) | ~$19 + $4.85 = $23.85 | Local clearly |
*Hardware amortization: used RTX 3090 at ~$680 ÷ 36 months = $18.89/month. Electricity at $0.097/1,000 images (SDXL on 3090).
The surprising finding: even at 300 images per month, a used RTX 3090 is slightly cheaper than Midjourney — because the hardware amortizes so cheaply. The real Midjourney advantage is zero upfront cost and zero maintenance. If you already own a 3090 for gaming, local SDXL is cheaper from the first image you generate.
The calculus shifts when you add Flux Dev. That $0.284/1,000 images on the 3090 adds up at production scale, and Midjourney’s quality for portrait and environmental photography is genuinely competitive with Flux Schnell (though not Dev). For users who generate 100–200 images per month and want the best quality with zero hardware overhead, cloud still makes sense.
SD 1.5 vs SDXL Quality: The Nuance
The intuitive answer — SDXL is better — is incomplete. Base SDXL at 1024×1024 produces more coherent images and better default lighting than base SD 1.5. But SD 1.5’s ecosystem of fine-tuned checkpoints (Dreamshaper 8, Realistic Vision, Photon, etc.) spent five years solving the anatomy and photorealism problems that base SDXL still exhibits. For specific subjects where a mature SD 1.5 checkpoint exists, that fine-tuned model often outperforms base SDXL in the areas users care about most.
SDXL’s own fine-tune ecosystem is deep and continues growing — Civitai’s SDXL section now rivals SD 1.5 in model count. But if you have a workflow that depends on a specific SD 1.5 subject-matter LoRA that hasn’t been ported to SDXL, that ecosystem lock-in is real.
Flux’s fine-tune library arrived fast. Apache 2.0 licensing (Schnell) and Black Forest Labs’ active community engagement drove rapid adoption. Flux LoRAs for common subjects — portraits, product photography, architectural styles — are now widely available and high quality.
Honest Take
The right model-GPU pairing depends on two variables: your VRAM ceiling and your generation volume.
Under 12 GB of VRAM: SD 1.5 is your speed engine. SDXL is your quality ceiling. Flux is a viable special-purpose tool for final renders but not for the iterative phase.
RTX 3090 (used, ~$680 in May 2026 per eBay completed listings — see our RTX 3090 value analysis for detailed pricing breakdown): The most practical card for running all three model families. SDXL at 5.5 seconds is comfortable. Flux Dev FP8 at ~16 seconds is usable for production work but slow for iteration. Flux Schnell at 6–8 seconds splits the difference.
RTX 4090: The only consumer card where Flux Dev becomes an actual iterative workflow tool rather than a “generate and wait” experience. If your primary work is photorealistic image generation, the 4090 is the hardware tier that makes Flux practical at scale. If you primarily run SDXL, the 3090 is 80% of the 4090’s capability at a third of the price.
No 24 GB GPU yet but need Flux Dev quality now: Renting an RTX 4090 on RunPod runs $0.34/hr (Community Cloud) or $0.69/hr (Secure Cloud) as of May 2026. At 10 seconds per Flux Dev image, you generate roughly 360 images per GPU-hour. That works out to approximately $0.00094 per image on Community Cloud — cheaper per image than the electricity cost of owning a 4090, because you’re only paying for the compute time you actually use. For a one-time project of 500–1,000 Flux Dev images, renting beats buying outright by a wide margin.
Electricity cost is almost never the deciding factor for owned hardware. The per-image numbers are small enough that the hardware purchase determines your cost trajectory, not the power bill. The two numbers that actually matter: what does the GPU cost, and how many images do you need per hour?
For the full power-bill math and GPU-specific buying advice, see the related articles below.
Related:
- GPU Buying Guide for Local AI (2026) — full tier list by budget
- Power Bill Math: True Cost of a 24/7 AI Server at Home — how the electricity numbers compound
- RTX 3090 in 2026: Still the AI Value King? — detailed 3090 analysis
- RunPod vs Local GPU: When to Rent vs Buy — cloud vs local for heavier workloads
1V1 PLAYBOOK · LOCAL LLM
Cut your local AI bill from $400/month cloud GPU to $47/month at home.
4-path hardware decision table, Ollama cold-start fix, Cursor/Claude Code routing configs, full 24-month TCO calculator.
Get it for $19 (early bird) →Sources
- GPU Benchmark — SDXL Across GPUs (ComfyUI Discussion #2970)
- RTX 4090 Flux.1 Benchmarks (ComfyUI Discussion #4571)
- Stable Diffusion Benchmarks: 45 Nvidia and AMD GPUs Compared — Tom’s Hardware
- Electric Power Monthly — U.S. Energy Information Administration (EIA)
- May 2026 Short-Term Energy Outlook — EIA (18.2¢/kWh residential forecast)
- NVIDIA RTX 4090 Power Analysis (450W TDP) — Hardware Busters
- RTX 3090 Maximum Power Draw (350W TDP) — NVIDIA Developer Forums
- RTX 3060 TDP 170W — Oreate AI Analysis
- Midjourney Pricing 2026: Plans and Cost per Image — CostBench
- Image Generation VRAM Requirements 2026: Flux, SDXL, SD 3.5 — WillItRunAI
- RunPod RTX 4090 Pricing — RunPod ($0.34/hr Community, $0.69/hr Secure Cloud, May 2026)
Last updated May 19, 2026. GPU prices, cloud API rates, and EIA electricity averages change regularly — verify current data before purchasing.
Recommended Gear
The hardware mentioned in this guide, with current prices on Amazon (affiliate links — at no extra cost to you, purchases help support this site):
Was this article helpful?
Thanks for the feedback — it helps improve future articles.
Need hands-on help?
I offer 1-on-1 technical consulting for local AI setup, GPU selection, and AI coding tool configuration — same topics covered on this site.
Book a session — $49 / hour →