Flux vs SDXL vs SD 1.5: Cost-per-Image Comparison Across GPUs (2026)

fluxsdxlstable-diffusionimage-generationcomparisongpucostlocal-ai

The question isn’t which model looks best in a side-by-side. Flux beats SDXL beats SD 1.5 on output quality — that hierarchy is settled. The question is what that quality upgrade actually costs you in GPU time, electricity, and cloud rental, because the spread is wide enough to change whether you buy hardware or rent it.

At 1,000 images a month for personal use, the difference is noise. At 50,000 product mockups or a fine-tuning dataset, the arithmetic matters: SD 1.5 on a rented RTX 4090 costs roughly $0.19 per 1,000 images; Flux.1 Dev on the same hardware costs $1.70 per 1,000. Nine times more. That changes the build-vs-rent math completely.

This article puts numbers on that gap: VRAM requirements, generation speed on three GPU tiers, electricity cost per 1,000 images at the US average rate, and the cloud rental cost on RunPod. Then it tells you which model makes sense at which volume.


The three models in 30 seconds

SD 1.5 (Stability AI, 2022): The original 860M-parameter UNet. Native output is 512×512 or 768×768. VRAM floor is 4 GB. Still the fastest option on consumer hardware by a wide margin, and the one to use if your main constraint is throughput or you’re stuck with an 8 GB card.

SDXL 1.0 (Stability AI, 2023): Scaled up to 3.5B parameters with a native 1024×1024 output. Noticeably better composition, text rendering, and detail retention at that resolution. Uses the same UNet architecture, so the speed model is similar — just slower due to the larger resolution and model size.

Flux.1 (Black Forest Labs, 2024): A 12B-parameter Diffusion Transformer (DiT). Two locally runnable variants:

  • Schnell — 4-step distilled model, Apache 2.0 license. Commercial use permitted.
  • Dev — 25–50 step version for best quality, non-commercial license only.

The critical architectural fact: each Flux step requires approximately 5× more floating-point operations than an equivalent SD 1.5 step. “Flux.1 Schnell uses only 4 steps” sounds like it should be much faster than SD 1.5 at 50 steps. In practice on current hardware, Schnell generates one image in roughly the same time SDXL does, because each of those 4 steps is much heavier. Flux.1 Dev at 20–50 steps is consistently the slowest of the three on any given GPU.


VRAM requirements

This table determines which models you can actually run before any discussion of speed.

ModelMinimum VRAMPractical VRAMNotes
SD 1.5 (512×512)4 GB6 GBRuns on GTX 1080 with attention slicing
SD 1.5 (768×768)5 GB8 GBNeeds attention slicing on 6 GB cards
SDXL base (1024×1024)6 GB8 GB6 GB workable with xformers + attention slicing
SDXL + refiner8 GB12 GBBoth models don’t fit simultaneously on 8 GB
Flux.1 Schnell/Dev (FP16)24 GB24 GBFull precision; RTX 3090 or 4090 only
Flux.1 Schnell/Dev (FP8)12–15 GB16 GBBest quality-per-VRAM tradeoff; runs on RTX 4060 Ti 16GB
Flux.1 Schnell/Dev (Q4 GGUF)6–8 GB8 GBFits RTX 3060 12GB; visible quality drop vs FP8

An RTX 3060 12GB can generate SD 1.5 images at full speed, SDXL images with some patience, and Flux only at Q4 GGUF — which is a different product visually from what you see in Flux demos. If Flux quality is the goal, 16 GB VRAM (for FP8) or 24 GB (for FP16) is the real entry point.

For a full breakdown of what VRAM tier covers which models, see the GPU buying guide for local AI.


Speed benchmarks by GPU tier

These figures are for ComfyUI with xformers enabled, 1024×1024 output where applicable, measured at steady-state (not first-run with model loading).

RTX 4090 (24 GB VRAM, 450W TBP)

ModelStepsSeconds/imageImages/hour
SD 1.550~2 sec~1,800
SDXL30~3.2 sec~1,125
Flux.1 Schnell (FP8)4~4–6 sec~600–900
Flux.1 Dev (FP16)25~18 sec~200

The RTX 4090 is the only consumer card that runs Flux.1 Dev at native FP16 with reasonable wait times. At 18 seconds per 1024×1024 image, it’s still usable for iterating on prompts — but you feel that gap when you’re used to SD 1.5 spitting out 30 images per minute.

RTX 3090 (24 GB VRAM, 350W TBP)

ModelStepsSeconds/imageImages/hour
SD 1.550~3.3 sec~1,090
SDXL30~5–6 sec~600–720
Flux.1 Schnell (FP16)4~19 sec~190
Flux.1 Dev (FP16)25~25–30 sec~120–145

The RTX 3090 is 46% slower than the RTX 4090 as a median across benchmark suites — the gap widens on Flux due to the lower memory bandwidth (936 GB/s vs 1,008 GB/s). The 3090 can run Flux.1 Dev at full FP16 precision thanks to its 24 GB VRAM, but you’re looking at half a minute per image. That’s workable for overnight batch runs; painful for iterative prompting.

See the used RTX 3090 evaluation for current street prices, which as of May 2026 are running around $680 used on eBay.

RTX 3060 12GB (12 GB VRAM, 170W TBP)

ModelStepsSeconds/imageImages/hour
SD 1.550~6–8 sec~450–600
SDXL30~25–35 sec~100–145
Flux.1 Q4 GGUF4~60–90 sec~40–60
Flux.1 Dev Q4 GGUF253–5 min/image~12–20

The RTX 3060 12GB draws the hard line: it is a capable SD 1.5 and SDXL machine but hits a wall with Flux. Q4 GGUF lets you run Flux technically, but the image quality is not representative of what Flux.1 can produce, and the generation time makes it impractical for any volume. If Flux is part of your workflow, this card is the ceiling you’ve hit.


Electricity cost per 1,000 images

US average residential electricity rate: $0.1765/kWh (EIA data, February 2026).

GPU power draw during image generation inference runs at roughly 80–90% of rated TBP. Using conservative estimates: RTX 4090 at 350W, RTX 3090 at 300W, RTX 3060 at 150W during active generation.

Formula: (watts / 1000) × (seconds_per_image × 1000 / 3600) × $0.1765

ModelGPUSec/imagePowerkWh per 1K imagesCost per 1K images
SD 1.5RTX 40902350W0.194$0.034
SD 1.5RTX 30903.3300W0.275$0.049
SD 1.5RTX 30607150W0.292$0.051
SDXLRTX 40903.2350W0.311$0.055
SDXLRTX 30905.5300W0.458$0.081
SDXLRTX 306030150W1.25$0.221
Flux.1 Schnell (FP8)RTX 40905400W0.556$0.098
Flux.1 Dev (FP16)RTX 409018400W2.0$0.353
Flux.1 Dev (FP16)RTX 309028330W2.57$0.453

For the power bill math behind 24/7 workloads more broadly, see the electricity cost article.

Three things stand out here:

  1. Electricity is not where the cost difference actually hurts. At 10,000 images/month, Flux.1 Dev on an RTX 4090 costs $3.53 in electricity vs. $0.34 for SD 1.5. That’s $3.19/month extra — annoying but not meaningful.

  2. The RTX 3060 penalty on SDXL is real. It uses more electricity per image than the RTX 4090 because the slower generation means the GPU runs hot longer, even at lower wattage. At 30 seconds per SDXL image, a 3060 uses 1.25 kWh per 1,000 images — vs. 0.31 kWh for the 4090.

  3. The 4090 is genuinely efficient per image. More watts at the wall, but far fewer seconds of work — net kWh per image drops.


Cloud rental cost per image on RunPod

If you don’t own a GPU, you rent one. RunPod Community Cloud pricing as of May 2026:

GPU$/hrSD 1.5 (images/hr)SDXL (images/hr)Flux Dev (images/hr)
RTX 4090$0.341,8001,125200
RTX 3090$0.191,090650130

Cost per 1,000 images on RunPod:

ModelGPU$/hrImages/hrCost per 1K images
SD 1.5RTX 3090$0.191,090$0.17
SD 1.5RTX 4090$0.341,800$0.19
SDXLRTX 3090$0.19650$0.29
SDXLRTX 4090$0.341,125$0.30
Flux.1 SchnellRTX 4090$0.34~700$0.49
Flux.1 DevRTX 3090$0.19130$1.46
Flux.1 DevRTX 4090$0.34200$1.70

At 10,000 images/month: SD 1.5 on RunPod costs $1.90; Flux.1 Dev costs $17. At 100,000 images/month, that’s $19 vs. $170 — the point at which buying a GPU starts to pencil out for Flux workflows.

Two counterintuitive findings:

  • RTX 3090 is barely cheaper than 4090 for SD 1.5 and SDXL because the RTX 4090’s higher throughput offsets its higher hourly rate.
  • Flux.1 Dev on RTX 3090 is slightly cheaper than 4090 despite slower generation, because the RTX 3090’s $0.19/hr rate cuts the cost despite producing fewer images per hour.

When to own vs. rent: the break-even math

Using SDXL on a used RTX 3090 ($680) as the reference case:

  • Cloud cost: $0.29/1,000 images
  • Local electricity only: $0.081/1,000 images
  • Savings per 1,000 images: $0.209
  • Images needed to break even: 680 ÷ $0.000209 ≈ 3.25 million images

At 10,000 images/month, that’s 27 years. At 200,000 images/month, it’s 16 months.

For Flux.1 Dev, the calculation improves slightly: cloud costs $1.70/1,000 vs. roughly $0.35–0.45 in electricity on an owned RTX 4090 ($2,000 new). Savings of $1.30/1,000. Break-even: ~1.54 million images. At 100,000 images/month, that’s 15 months. At 50,000/month: 30 months.

The practical conclusion: for hobbyists and small studios, renting wins on economics alone. The GPU only makes sense if you need it for other workloads (LLMs, fine-tuning, gaming) so the cost isn’t allocated entirely to image generation. That shared-use case is the main reason to own hardware — not pure image generation economics.

For a detailed look at how cloud vs. local math works out for LLM inference, the same framework applies in the RunPod vs. local GPU article.


The Flux licensing trap

One number the cost tables don’t capture: Flux.1 Dev is non-commercial. If you’re generating images for a business, product, or any monetized output, you need Flux.1 Schnell (Apache 2.0) or the commercial Flux.1 Pro API (not locally runnable).

This is worth flagging explicitly because most Flux discussion online conflates Dev and Schnell. Dev images look better at equivalent steps. Dev is also what most comparison screenshots use. But Dev cannot be used for commercial output under its license terms.

Schnell at 4 steps gets you most of the Flux visual quality advantage over SDXL, commercially, at roughly 2.5–3× the per-image cost of SDXL on the same hardware. For product photography mockups, character sheets, or dataset augmentation, that trade-off is often worth making. For personal art generation, Dev’s quality ceiling is accessible at no licensing cost.


Honest take

Use SD 1.5 if: throughput is the constraint, your GPU has 4–8 GB VRAM, or you’re generating training data at scale where cost per image matters more than quality per image.

Use SDXL if: you need 1024×1024 native output, better composition and text rendering, and you have at least 6 GB VRAM. This is the best cost-to-quality ratio for most practical image generation workflows in 2026.

Use Flux.1 Schnell if: output quality is the priority and you have 12+ GB VRAM for FP8 (ideally 16–24 GB). Commercially licensed. Accept that each image will take 3–6× longer than SDXL on the same hardware and cost proportionally more on cloud.

Use Flux.1 Dev if: you’re doing personal creative work, have a 24 GB card, and want the best image quality currently achievable with a locally runnable open-weight model. Do not use it for commercial output.

On the RTX 4060 Ti 16GB specifically: FP8 Flux is the sweet spot — 12–15 GB puts it in range, generation time is tolerable, and you’re not paying for 24 GB you don’t need for SDXL or SD 1.5. See the RTX 4060 Ti 16GB vs RX 7900 XT comparison for how that card stacks against AMD’s alternative.

The cost difference at low volumes is too small to drive decisions. At high volume, SD 1.5 on a 3090 remains the cheapest image-per-dollar option running today. Flux gets expensive fast — not because of electricity, but because slower generation costs more on rented hardware and amortizes owned hardware more slowly.


1V1 PLAYBOOK · LOCAL LLM

Cut your local AI bill from $400/month cloud GPU to $47/month at home.

4-path hardware decision table, Ollama cold-start fix, Cursor/Claude Code routing configs, full 24-month TCO calculator.

Get it for $19 (early bird) →

Sources

Last updated May 20, 2026. GPU prices and cloud rental rates change frequently — verify current rates before making purchasing decisions.


The hardware mentioned in this guide, with current prices on Amazon (affiliate links — at no extra cost to you, purchases help support this site):

Was this article helpful?