Qualcomm's $10B Tenstorrent Bid: What RISC-V AI Cards Mean for Home Labs in 2026

tenstorrentrisc-vqualcommai-acceleratorlocal-llmhardwarenvidiabenchmark

TL;DR: Qualcomm is reportedly in talks to buy Jim Keller’s Tenstorrent for up to $10B, and the cards at the center of it — the Blackhole p150a (32GB, $1,399) and p100a (28GB, $999) — are buyable today. They give you more VRAM than a used RTX 3090 at lower power, but roughly half the memory bandwidth and a much younger software stack. For pure local LLM inference in 2026, a used 3090 still wins on tokens/sec and ecosystem.

Blackhole p150aBlackhole p100aUsed RTX 3090
Memory32GB GDDR628GB GDDR624GB GDDR6X
Bandwidth512 GB/s448 GB/s936 GB/s
Price$1,399$999~$1,070 (used)
Power (TBP)300W300W350W
SoftwareTT-Metalium (Apache 2.0), youngSameCUDA, mature
Best forRISC-V/open-stack devs, big-model headroomCheapest 28GB inference cardFastest tok/s for the money

Honest take: Buy a Tenstorrent card if you want to develop on an open RISC-V stack or you need 28–32GB at 300W and you enjoy being early. If you just want a local LLM that runs fast tonight, a used RTX 3090 is still the better buy — and don’t buy speculatively hoping the Qualcomm deal helps you. Acquisitions disrupt roadmaps before they improve them.

What Qualcomm is actually trying to buy

On June 15, 2026, The Information reported — and Reuters, Tom’s Hardware, and The Register picked up — that Qualcomm is in advanced talks to acquire Tenstorrent at a valuation between $8 billion and $10 billion. The deal is said to be cash and stock with performance-based adjustments still under negotiation, and nothing is final. QCOM stock jumped over 4% on the news.

Tenstorrent is the AI-chip startup founded in Canada in 2016 and run by Jim Keller — the silicon architect behind Apple’s A-series, AMD’s Zen, and Tesla’s FSD chip. What makes Tenstorrent different from every other AI-accelerator name you’ve seen acquired is the architecture: it’s built on open RISC-V CPU IP paired with proprietary Tensix AI cores, and the entire software stack is open source. For a phone-SoC company like Qualcomm watching smartphone growth flatten, buying a RISC-V inference architecture is a shortcut into the part of AI spend that’s growing fastest — inference, not training.

That’s the boardroom story. The reason it matters on this site is narrower: the hardware Tenstorrent sells is consumer-accessible. You can put a PCIe card in a cart on tenstorrent.com right now. So the real question for a home lab isn’t “will the deal close” — it’s “does this hardware do anything a used 3090 or a 5060 Ti doesn’t, and should I wait to find out?”

The cards you can actually buy

Tenstorrent sells two generations of PCIe inference cards. The current Blackhole generation is the headline; the older Wormhole generation is what most of the software has actually been tuned against.

CardTensix coresSRAMMemoryBandwidthTBPPrice
Blackhole p100a120180MB28GB GDDR6448 GB/s300W$999
Blackhole p150a140210MB32GB GDDR6512 GB/s300W$1,399
Wormhole n150d72108MB12GB GDDR6288 GB/s160W$1,099
Wormhole n300d128 (2 ASIC)192MB24GB GDDR6576 GB/s$1,449$1,449

The Blackhole p150a is the interesting one. It carries 32GB of GDDR6 — the same capacity as an RTX 5090, more than any 3090/4090 — for $1,399, draws 300W, and adds four QSFP-DD 800G ports for clustering cards together. The p100a undercuts it at $999 for 28GB. On paper, Tenstorrent claims the p150a matches an RTX 4090 in FP8 and BF16 TFLOPS at lower power (300W vs the 4090’s 450W).

The catch is in the column most buyers skip: memory bandwidth. Decode speed — the tokens-per-second you actually watch stream — is bound by how fast the chip can read the model’s weights out of memory, not by peak TFLOPS. We’ve made this point in our NPU vs GPU breakdown and it applies here exactly. The p150a’s 512 GB/s is barely over half a used RTX 3090’s 936 GB/s, and a third of an RTX 5090’s ~1,792 GB/s. More capacity, slower reads.

The bandwidth math, in tokens

Here’s why that gap matters. A 7B model in Q4_K_M is roughly 4.5GB of weights. To generate one token, the chip reads the active weights once. Crudely, decode throughput scales with bandwidth ÷ bytes-read-per-token, so the ceiling is set by GB/s.

A used RTX 3090 at 936 GB/s does roughly 95 tok/s on a 7B Q4 model — a number we’ve measured repeatedly across the site. Scale that by bandwidth and the Blackhole p150a’s theoretical ceiling lands near 50–55 tok/s on the same model, before you account for software efficiency. And software efficiency is the second tax: early community benchmarks suggest Blackhole reaches 40–60% of its theoretical TFLOPS on real LLM workloads, versus 60–80% for NVIDIA cards with mature kernels. Stack the two together and the practical decode speed on a single Blackhole card for everyday 7B–14B models is well under what a 3090 delivers.

Where Tenstorrent’s numbers look strong is aggregate, batched throughput — the data-center metric, not the single-user one:

SystemModelThroughputNotes
TT-QuietBox 2 (4× Blackhole)Llama 3.1 70B476.5 tok/sAggregate, vendor figure
Wormhole Galaxy (32 ASIC)Llama 70B, batch 32~4,000–5,000 tok/sVendor, not independently verified
8× H100 SXM5 (vLLM)Llama 70B, batch 32~2,500–3,500 tok/sFor comparison

Read those carefully. The 476.5 tok/s on the TT-QuietBox 2 — four Blackhole ASICs, 480 Tensix cores, 2,654 TFLOPS BlockFP8, 128GB GDDR6, starting at $9,999 — is aggregate throughput across concurrent requests, not what one person watching one chat session feels. The Galaxy numbers come from Tenstorrent’s own controlled runs on TT-Metal, not independent third-party audits, and a single user pulling one stream off any of these boxes sees a fraction of the aggregate. For a home lab running one or two sessions at a time, batched throughput is the wrong yardstick.

The software is the real story (and the real risk)

The thing that genuinely sets Tenstorrent apart isn’t the silicon — it’s that the entire stack is Apache 2.0 open source. TT-Metalium (the low-level kernel programming model) and TTNN (the operator library) are both Apache 2.0. There’s a Tenstorrent-maintained fork of vLLM, and tt-inference-server wraps it in an OpenAI-compatible API so you can point existing tooling at it. If you’ve been frustrated by CUDA’s black-box nature, this is the most open serious AI accelerator you can buy. It’s the same “no-CUDA-required” pitch we examined with Intel Arc and AMD ROCm, but taken further — the kernels are yours to read and rewrite.

The flip side is maturity. As of early 2026, most verified model support and documentation targets the Wormhole n150/n300 cards; Blackhole software is earlier in its cycle. There’s an experimental llama.cpp discussion thread for Grayskull/Wormhole, but it’s community work, not a polished path. CUDA, by contrast, runs every model on day one with Ollama, LM Studio, llama.cpp, vLLM, and ComfyUI. With a Tenstorrent card you’re buying into a roadmap, not a finished product — you’ll spend time getting models running that would be a one-line ollama pull on NVIDIA. If you like building on an open stack (this is firmly FOSS territory), that’s a feature. If you want to run a model tonight, it’s friction.

RISC-V vs CUDA: why Qualcomm cares and you might not (yet)

The strategic logic is sound. Inference is becoming the dominant cost in AI, NVIDIA’s CUDA moat is built around training, and a RISC-V architecture with an open stack is the kind of thing a hyperscaler or a phone-SoC giant can build a vertically integrated product around without paying the NVIDIA tax. That’s a multi-year, billion-dollar bet — exactly the bet Qualcomm appears to be making.

None of that changes what a single PCIe card does in your desktop in 2026. RISC-V vs CUDA is a developer-ecosystem question, and ecosystems take years. The cards are real and competitively priced, but the software gap is the entire ballgame for a home user, and that gap closes on a timeline measured in software releases, not press releases.

And the acquisition itself is a reason for inference buyers to wait, not rush. Acquisitions reshuffle roadmaps — product lines get renamed, deprecated, refocused on the acquirer’s priorities (Qualcomm’s are mobile and edge, not necessarily your desktop tower). Buying a Tenstorrent card today on the theory that “Qualcomm money will make it better” is backwards: the disruption comes first. This mirrors the broader 2026 supply picture we covered in NVIDIA skipping new consumer GPUs — interesting hardware, uncertain timing.

Who should actually buy one

Buy a Blackhole p100a ($999) or p150a ($1,399) if:

  • You want to develop on an open RISC-V/Tensix stack and the openness is the point.
  • You need 28–32GB of VRAM at 300W and you’re comfortable doing setup work that’s a one-liner on NVIDIA.
  • You’re running batched, multi-user serving where aggregate throughput matters more than single-stream latency.

Skip it and buy a used RTX 3090 (~$1,070) if:

  • You want the fastest single-user tok/s for the money. 936 GB/s beats 512 GB/s, and CUDA’s maturity means everything just works.
  • You run Ollama, LM Studio, ComfyUI, or vLLM and want day-one model support.
  • You’re new to local AI and want the path of least resistance.

Buy an RTX 5090 or 4090 if you want maximum bandwidth (the 5090’s ~1,792 GB/s is in another class) and you’re not price-sensitive. For coding-agent workloads specifically, our sister site aicoderscope.com tracks which models and tools justify which hardware.

The honest summary: Tenstorrent makes the most open AI accelerator you can buy, and at $999–$1,399 the prices aren’t a gimmick. But “open” and “cheap VRAM” don’t beat “fast and supported” for everyday local inference yet. The Qualcomm deal is a vote of confidence in where this architecture is going — not a reason to be there before the software arrives.

FAQ

Is the Qualcomm–Tenstorrent acquisition confirmed? No. As of June 15–17, 2026, multiple outlets report Qualcomm is in advanced talks at an $8–10B valuation, structured as cash and stock with performance adjustments. No final agreement has been announced, and terms could change.

Can I run Ollama or llama.cpp on a Tenstorrent card? Not the way you do on NVIDIA. Tenstorrent’s supported path is its vLLM fork via tt-inference-server (OpenAI-compatible API). There’s an experimental community llama.cpp discussion for Grayskull/Wormhole, but it isn’t a turnkey path. Blackhole software support trails Wormhole.

How does the Blackhole p150a compare to an RTX 4090 for LLMs? On paper they’re close in FP8/BF16 TFLOPS, and the p150a draws less power (300W vs 450W) with more VRAM (32GB vs 24GB). In practice, the 4090’s higher bandwidth and mature CUDA kernels make it faster and far easier to use for single-user local inference today.

Is Tenstorrent’s software really fully open source? Yes — TT-Metalium and TTNN are Apache 2.0 licensed, and the vLLM integration and inference server are open. It’s the most open serious AI-accelerator stack available, which is its biggest differentiator from CUDA.

Should I wait for the deal to close before buying? For pure inference, waiting is reasonable. Acquisitions disrupt roadmaps before improving them, and Qualcomm’s priorities (mobile, edge) may not align with desktop add-in cards. If you want to develop on the open stack now, the cards work today regardless of the deal.

Sources

Last updated June 24, 2026. Prices and specs change; the Qualcomm–Tenstorrent deal was unconfirmed at publication. Verify current rates and deal status before purchasing.

Was this article helpful?