May 19, 2026

Flux vs SDXL vs SD 1.5: Real Cost-per-Image Across GPUs (2026)

By RunAIHome Team · 14 min read

fluxsdxlstable-diffusioncomparisongpucostlocal-aiimage-generationbenchmark

Three generations of image models now live in a typical ComfyUI installation (Windows users: see our ComfyUI Windows setup guide), and the choice between them isn’t obvious. SD 1.5 still commands the deepest fine-tune ecosystem ever built around a single model. SDXL is the default backbone for most home-lab artists. Flux.1 produces images that read as professional photography — handles human hands, readable in-image text, and complex lighting in ways that SD and SDXL can’t reliably match.

The tradeoff is hardware. Flux requires 12–24 GB VRAM and takes 4–10× longer per image than SDXL on the same GPU. Whether that matters depends on how many images you generate per session and what GPU you’re running. This article quantifies those costs: verified generation times across two GPU tiers, converted into dollar-per-image electricity costs at the current US average of $0.1765/kWh (EIA February 2026), and a cloud comparison that tells you when a $30/month Midjourney subscription is still the smarter call.

The Three Models at a Glance

Model	Architecture	Parameters	Native Resolution	VRAM (FP16)	VRAM (FP8/GGUF)
SD 1.5	U-Net	860M	512×512	4–6 GB	—
SDXL 1.0	U-Net (dual)	3.5B	1024×1024	8–12 GB	—
Flux.1 Dev	DiT transformer	12B	1024×1024	~24 GB	12–14 GB
Flux.1 Schnell	DiT transformer	12B	1024×1024	~24 GB	12–14 GB

SD 1.5 and SDXL use U-Net architectures — compact, fast, designed for iterative denoising. Flux uses a Diffusion Transformer (DiT) architecture at 12 billion parameters. The quality jump is observable and consistent: Flux renders legible text in generated images, renders human anatomy with significantly fewer errors, and handles complex multi-element compositions more coherently. SDXL cannot do any of these reliably.

Dev vs Schnell: Flux.1 Schnell uses knowledge distillation to produce usable images in 4 steps instead of the 20+ steps Flux Dev requires. Schnell is Apache 2.0 licensed; Dev carries a non-commercial research restriction. For personal home use, either is legally fine. Schnell is faster, but most users running quality-critical work prefer Dev at 20 steps for the added detail — especially for photorealistic subjects.

Raw Speed: Verified Benchmarks

The SDXL numbers below come from ComfyUI’s public benchmark thread (Discussion #2970), which aggregated community-submitted hardware results for SDXL 1.0 at 1024×1024, 20 steps in ComfyUI. The Flux.1 Dev numbers come from ComfyUI Discussion #4571 (RTX 4090 Flux benchmarks, multiple contributors). SD 1.5 timings are derived from Automatic1111 community benchmarks; the 4090 vs 3090 ratio is confirmed by Tom’s Hardware testing.

SDXL at 1024×1024, 20 steps

GPU	it/s	Sec/Image
RTX 3070 8 GB	2.26	8.8 s
RTX 3090 24 GB	3.61	5.5 s
RTX 4080 16 GB	3.53	5.7 s
RTX 4090 24 GB	7.61	2.6 s

The RTX 3090 and RTX 4080 16GB land within 3% of each other on SDXL — roughly equal inference speed despite the VRAM difference. The RTX 4090 pulls ~2× ahead.

Flux.1 Dev at 1024×1024, 20 steps

GPU	Precision	Sec/Image
RTX 4090	FP8 + `--fast`	9–10 s
RTX 4090	Q8 GGUF	15–17 s
RTX 4090	FP16 full	18–41 s
RTX 3090	FP8	~14–18 s

The FP16 time for RTX 4090 varies widely (18–41 s) depending on whether torch.compile is active and whether the VRAM pressure forces any CPU offloading. FP8 with --fast is the practical default on 24 GB cards — it fits cleanly, the quality delta from FP16 is undetectable at normal viewing distances, and the 9–10 second generation time is genuinely workflow-usable.

The RTX 3090 FP8 estimate (~14–18 s) is derived from community reports of the 3090 running approximately 40–45% slower than the 4090 per iteration, consistent with multiple benchmark sources.

Flux.1 Schnell at 1024×1024, 4 steps

GPU	Precision	Sec/Image
RTX 4090	FP8	~4–5 s
RTX 3090	FP8	~6–8 s

Schnell at 4 steps is competitive with SDXL at 20 steps in pure generation time on the RTX 4090. Quality isn’t SDXL-equivalent — it’s better in photorealism, weaker in fine-grained compositional control where SDXL’s ecosystem of refined samplers and CFG schedules still has an edge. For prompt-exploration workflows where you’re running 50+ generations to find the right composition, Schnell makes Flux economically viable on a 3090.

SD 1.5 at 512×512, 50 steps

GPU	it/s	Sec/Image
RTX 4090	~37.6	~1.3 s
RTX 3090	~18.8	~2.7 s

SD 1.5’s native resolution is 512×512. At that resolution and 50 steps, the RTX 4090 generates roughly 46 images per minute. The gap over SDXL and Flux in raw throughput is dramatic. For workflows that require hundreds of iterations — LoRA testing, prompt engineering sessions, batch rendering concept grids — SD 1.5’s speed advantage is real and meaningful.

The Electricity Math

At $0.1765/kWh (US residential average, EIA February 2026) and official NVIDIA TDPs (RTX 4090: 450W, RTX 3090: 350W):

Formula: cost = (seconds/image × 1000 images ÷ 3600) × (TDP_kW) × ($/kWh)

Model	GPU	Sec/Image	TDP	Cost / 1,000 Images
SD 1.5	RTX 4090 (450W)	1.3 s	450W	$0.029
SD 1.5	RTX 3090 (350W)	2.7 s	350W	$0.046
SDXL	RTX 4090 (450W)	2.6 s	450W	$0.057
SDXL	RTX 3090 (350W)	5.5 s	350W	$0.094
Flux Schnell	RTX 4090 (450W)	4.5 s	450W	$0.099
Flux Schnell	RTX 3090 (350W)	7.0 s	350W	$0.120
Flux Dev (FP8)	RTX 4090 (450W)	10 s	450W	$0.221
Flux Dev (FP8)	RTX 3090 (350W)	16 s	350W	$0.275

Three things stand out:

1. Electricity is not the cost driver — hardware is. Even running Flux Dev on an RTX 3090 at full throughput 24/7 for a month produces roughly 162,000 images and costs about $45 in electricity. The GPU purchase is always the dominant number.

2. Flux Schnell on an RTX 4090 costs roughly the same electricity-per-image as SDXL on an RTX 3090. The 4090 generates Schnell images nearly twice as fast, which largely cancels out its higher TDP.

3. The gap from SDXL to Flux Dev is real. At 10 seconds per image versus 2.6 seconds, Flux Dev takes 3.8× longer on the same 4090, which translates to 3.8× the electricity cost. For 10,000 images monthly, that’s $2.28 vs $0.60 in electricity — not consequential on its own, but multiply by years and it adds up.

VRAM Tiers: What You Can Actually Run

The VRAM question isn’t just about whether a model fits — it’s about whether it fits at a speed that matches your workflow.

12 GB cards (RTX 3060 12GB, RTX 4060 Ti 12GB): SD 1.5 at full speed. SDXL runs but benefits from 16 GB headroom, especially with ControlNet or a refiner loaded simultaneously. Flux requires GGUF Q5 or lower quantization and will use CPU offloading for the text encoders — expect 30–60 seconds per image. Usable for final production renders, impractical for iterative workflows.

ComfyUI’s Dynamic VRAM system (released March 2026) improved the 12 GB Flux experience by reducing peak RAM pressure, but it doesn’t change the fundamental compute bottleneck. The 3060 12GB is still a solid SDXL card — it’s a slow Flux card.

16 GB cards (RTX 4060 Ti 16GB, RTX 4080 16GB): SDXL runs comfortably at any sampler setting. Flux FP8 fits at 12–14 GB so 16 GB gives clear headroom without offloading. Flux Dev at 20 steps will land around 12–15 seconds per image on an RTX 4060 Ti 16GB (based on relative bandwidth: 4060 Ti 16GB at 288 GB/s vs 4090 at 1,008 GB/s).

24 GB cards (RTX 3090, RTX 4090): Full Flux FP8 without compromise. The 3090 is the minimum card where Flux Dev becomes a genuine production tool rather than an experiment.

Per-Use-Case Decision Matrix

Use Case	Recommended Model	Why
Rapid iteration / prompt exploration	SD 1.5	46 images/min on RTX 4090. Cheapest electricity per image by 3–10×.
LoRA-heavy character work	SD 1.5 or SDXL	SD 1.5 has deepest fine-tune library; SDXL for native 1024px quality
Photorealistic portraits / scenes	Flux.1 Dev or Schnell	Hands, eyes, lighting depth that SDXL can’t reach
Text inside images (logos, signs, covers)	Flux.1 only	SD 1.5 and SDXL cannot produce reliably readable in-image text
Batch production (1,000+ images)	SD 1.5 or SDXL	Volume workflows need the speed advantage; Flux electricity cost multiplies
12 GB VRAM	SD 1.5, SDXL FP16	Flux works via GGUF but too slow for iteration
24 GB GPU	SDXL for speed, Flux Dev for quality	No compromises on either

Local vs Cloud: When Does Midjourney Win?

Midjourney Standard in 2026 costs $30/month for approximately 900 fast-mode images — roughly $0.033 per image. For light users, that pricing is hard to beat.

The break-even against a used RTX 3090 running SDXL:

Monthly Volume	Midjourney Cost	RTX 3090 Monthly Cost*	Winner
300 images/mo	$30	~$19 (hw amort.) + $0.29 elec. = $19.29	Local (slight)
500 images/mo	$30	~$19 + $0.49 = $19.49	Local
1,000 images/mo	$33 (fast hours)	~$19 + $0.97 = $19.97	Local
5,000 images/mo	~$55 (Pro plan)	~$19 + $4.85 = $23.85	Local clearly

*Hardware amortization: used RTX 3090 at ~$680 ÷ 36 months = $18.89/month. Electricity at $0.097/1,000 images (SDXL on 3090).

The surprising finding: even at 300 images per month, a used RTX 3090 is slightly cheaper than Midjourney — because the hardware amortizes so cheaply. The real Midjourney advantage is zero upfront cost and zero maintenance. If you already own a 3090 for gaming, local SDXL is cheaper from the first image you generate.

The calculus shifts when you add Flux Dev. That $0.284/1,000 images on the 3090 adds up at production scale, and Midjourney’s quality for portrait and environmental photography is genuinely competitive with Flux Schnell (though not Dev). For users who generate 100–200 images per month and want the best quality with zero hardware overhead, cloud still makes sense.

SD 1.5 vs SDXL Quality: The Nuance

The intuitive answer — SDXL is better — is incomplete. Base SDXL at 1024×1024 produces more coherent images and better default lighting than base SD 1.5. But SD 1.5’s ecosystem of fine-tuned checkpoints (Dreamshaper 8, Realistic Vision, Photon, etc.) spent five years solving the anatomy and photorealism problems that base SDXL still exhibits. For specific subjects where a mature SD 1.5 checkpoint exists, that fine-tuned model often outperforms base SDXL in the areas users care about most.

SDXL’s own fine-tune ecosystem is deep and continues growing — Civitai’s SDXL section now rivals SD 1.5 in model count. But if you have a workflow that depends on a specific SD 1.5 subject-matter LoRA that hasn’t been ported to SDXL, that ecosystem lock-in is real.

Flux’s fine-tune library arrived fast. Apache 2.0 licensing (Schnell) and Black Forest Labs’ active community engagement drove rapid adoption. Flux LoRAs for common subjects — portraits, product photography, architectural styles — are now widely available and high quality.

Honest Take

The right model-GPU pairing depends on two variables: your VRAM ceiling and your generation volume.

Under 12 GB of VRAM: SD 1.5 is your speed engine. SDXL is your quality ceiling. Flux is a viable special-purpose tool for final renders but not for the iterative phase.

RTX 3090 (used, ~$680 in May 2026 per eBay completed listings — see our RTX 3090 value analysis for detailed pricing breakdown): The most practical card for running all three model families. SDXL at 5.5 seconds is comfortable. Flux Dev FP8 at ~16 seconds is usable for production work but slow for iteration. Flux Schnell at 6–8 seconds splits the difference.

RTX 4090: The only consumer card where Flux Dev becomes an actual iterative workflow tool rather than a “generate and wait” experience. If your primary work is photorealistic image generation, the 4090 is the hardware tier that makes Flux practical at scale. If you primarily run SDXL, the 3090 is 80% of the 4090’s capability at a third of the price.

No 24 GB GPU yet but need Flux Dev quality now: Renting an RTX 4090 on RunPod runs $0.34/hr (Community Cloud) or $0.69/hr (Secure Cloud) as of May 2026. At 10 seconds per Flux Dev image, you generate roughly 360 images per GPU-hour. That works out to approximately $0.00094 per image on Community Cloud — cheaper per image than the electricity cost of owning a 4090, because you’re only paying for the compute time you actually use. For a one-time project of 500–1,000 Flux Dev images, renting beats buying outright by a wide margin.

Electricity cost is almost never the deciding factor for owned hardware. The per-image numbers are small enough that the hardware purchase determines your cost trajectory, not the power bill. The two numbers that actually matter: what does the GPU cost, and how many images do you need per hour?

For the full power-bill math and GPU-specific buying advice, see the related articles below.

Related:

GPU Buying Guide for Local AI (2026) — full tier list by budget
Power Bill Math: True Cost of a 24/7 AI Server at Home — how the electricity numbers compound
RTX 3090 in 2026: Still the AI Value King? — detailed 3090 analysis
RunPod vs Local GPU: When to Rent vs Buy — cloud vs local for heavier workloads

1V1 PLAYBOOK · LOCAL LLM

Cut your local AI bill from $400/month cloud GPU to $47/month at home.

4-path hardware decision table, Ollama cold-start fix, Cursor/Claude Code routing configs, full 24-month TCO calculator.

Get it for $19 (early bird) →

Sources

Last updated May 19, 2026. GPU prices, cloud API rates, and EIA electricity averages change regularly — verify current data before purchasing.

Recommended Gear

The hardware mentioned in this guide, with current prices on Amazon (affiliate links — at no extra cost to you, purchases help support this site):

Was this article helpful?