Jun 16, 2026

LM Studio Locally + LM Link 2026: Control Your Home GPU Rig From Your iPhone

By RunAIHome Team · 14 min read

lm-studiolm-linklocallyiphonetailscalelocal-llmapple-silicontutorialremote-access

TL;DR: LM Studio 0.4.16 (June 4, 2026) shipped Locally for iPhone/iPad and LM Link — an end-to-end encrypted Tailscale mesh that lets your phone run the models on your home Mac or RTX rig over any network. Inference stays on the rig; only a device list touches LM Studio’s servers. It works, it’s private, and on a Mac Studio M4 Max a 7B model streams at ~87 tok/s — far faster than you read on a phone.

What you’ll be able to do after this guide:

Chat with a 70B model running on your home machine from your phone on cellular, with chats stored only on your devices
Connect across CGNAT, hotel Wi-Fi, and corporate firewalls with zero port forwarding
Know which model sizes actually feel responsive over a remote link, and which don’t

Honest take: If you already own a capable home rig and a Mac/PC running LM Studio, LM Link is the cleanest remote-access setup we’ve tested — no reverse proxy, no exposed ports, no SSH tunnel. The catch is it’s account-gated and LM Studio-only on both ends, so it’s a convenience layer for existing LM Studio users, not a general remote-inference server.

What LM Studio actually shipped

LM Studio 0.4.16 landed on June 4, 2026 (build 1) with two connected pieces:

Locally — a first-party iPhone and iPad app. LM Studio acquired the standalone Locally AI app in April 2026 and rebuilt it as the official mobile front end. It’s the chat client; it does not run large models itself.
LM Link — the transport. It connects devices you own that are signed into the same LM Studio account, then lets a “client” device (your phone) load and use models running on a “host” device (your Mac or PC) as if they were local.

Build 2, on June 8, 2026, removed the request-gated waitlist so LM Link is open to everyone, and bumped the default context length to 8k tokens. Launch is iPhone and iPad only — Android has not been announced.

The important architectural detail: inference never moves. Your phone is a thin client. The model weights load into your rig’s VRAM or unified memory, tokens generate there, and only the text streams back to the phone. That’s the entire reason this is interesting for home-lab owners — your phone’s 8GB of RAM was never going to run a 70B model, but your Mac Studio or 24GB GPU can, and now you can reach it from the couch or a train.

How LM Link works under the hood

LM Link is built on Tailscale, the WireGuard-based mesh VPN, but you don’t install or configure Tailscale yourself. LM Studio embeds tsnet — a userspace library version of Tailscale that runs entirely inside the app. That matters for three reasons:

No kernel changes, no admin rights. tsnet is a userspace Go program that adds mesh networking without touching kernel sockets, system routing tables, or global VPN settings. Installing LM Link doesn’t reroute your other traffic.
NAT traversal without port forwarding. The tunnel punches through CGNAT, corporate firewalls, and double-NAT home routers. Two devices find each other through Tailscale’s coordination servers and then connect directly. You never open a port on your router or expose anything to the public internet.
End-to-end encryption via WireGuard. Prompts, responses, model listings, and hardware info travel only between your devices. Per LM Studio, neither Tailscale nor LM Studio’s backend can read the contents — the only thing that touches LM Studio’s servers is your device discovery list, so the two devices can find each other.

On the host, remote models are still served through the standard localhost:1234 OpenAI-compatible endpoint, which is why the same setup works with any tool that already talks to LM Studio’s local server. If you’ve read our local AI privacy audit, this is the rare remote-access feature that doesn’t blow a hole in the “data stays on my machine” promise: the inference and the chat history both stay on hardware you own.

Setup: phone to rig in about five minutes

You need LM Studio 0.4.16 or later on the host (Mac, Windows, or Linux), an LM Studio account, and the Locally app on an iPhone or iPad. Both devices sign into the same account — that shared identity is what authenticates the link. No API keys, no static tokens.

Step 1 — Update and prep the host

Update LM Studio to 0.4.16+ on the machine that has the GPU or Apple Silicon. Sign in to your LM Studio account (top-right). Download at least one model you want to reach remotely — a 7B–14B model is the sweet spot for phone use; more on why below.

Step 2 — Enable LM Link on the host

Open LM Studio settings and toggle LM Link on. Since build 2, there’s no waitlist. The host registers itself in your account’s device list. Load the model you want available — LM Link exposes whatever is currently loaded (or loadable) on the host through the link.

# Sanity-check the local server is up before going remote.
# On the host, LM Studio serves an OpenAI-compatible endpoint:
$ curl http://localhost:1234/v1/models

{"data":[{"id":"qwen2.5-7b-instruct","object":"model", ... }],"object":"list"}

If that returns your model, LM Studio’s server is healthy and LM Link has something to serve.

Install Locally from the App Store on your iPhone or iPad. Sign in with the same LM Studio account, then enable LM Link in Locally’s settings.

Step 4 — Let the devices discover each other

With LM Link on at both ends, the host and phone discover each other over the mesh — regardless of which networks they’re on. Your home Mac on Wi-Fi and your phone on LTE will still find each other. In Locally, pick the host device, pick the model, and start chatting. The first connection can take a few seconds while the tunnel establishes; after that it’s persistent.

Step 5 — Verify it’s actually remote

The honest test: put your phone on cellular only (turn off Wi-Fi), then send a prompt. If it streams a reply, you’re running a model on your home rig from the cellular network with no ports open and nothing exposed to the internet. That’s the whole point.

Real-world latency: what actually feels fast

This is where expectations need calibrating. “Latency” over LM Link has two parts, and only one of them is the network.

Part 1 — the network round trip. Because LM Link uses Tailscale, the best case is a direct WireGuard connection between your phone and rig, where added latency is just the cellular/Wi-Fi round trip — typically tens of milliseconds on LTE, lower on 5G. If a direct path can’t be established (some restrictive carrier or corporate NATs), Tailscale falls back to a relay, which adds more latency. Either way, this is a one-time cost on connection plus a small per-message overhead. It is not the thing you’ll notice.

Part 2 — token generation speed. This dominates the experience, and it’s entirely determined by your host hardware, not the link. Streaming hides network latency well: as long as the rig generates tokens faster than you read them, the reply feels live.

So the real question isn’t “is LTE fast enough” — it’s “is your host fast enough.” Here’s the calibration that matters. Comfortable reading speed is roughly 7–10 tokens per second. On a phone, where you read in shorter bursts, anything above ~15 tok/s feels essentially instant. Now map that to verified Apple Silicon numbers:

Model on host	Mac Studio M4 Max (546 GB/s)	Feels like on a phone
Qwen2.5 7B Q4_K_M	~87 tok/s	Instant — text appears faster than you can read
14B Q4_K_M	~40–50 tok/s	Instant for reading
Llama 3.3 70B Q4_K_M	~20–28 tok/s	Comfortable; faster than reading speed
70B at long context	drops toward ~18 tok/s	Still readable, slight wait on first token

The takeaway: on a Mac Studio M4 Max, even a 70B model at 20–28 tok/s comfortably outpaces phone reading speed, so the remote experience feels good. The bottleneck you’ll actually hit is time to first token on big models with long prompts — the host has to process your context before the first word appears. A 7B model is near-instant; a 70B model with a long conversation history makes you wait a beat. For phone use, that’s the argument for keeping a fast 7B–14B model loaded as your default and reserving the 70B for when you specifically want depth.

If your host is a Mac Mini M4 Pro instead, expect roughly half those rates (around 14 tok/s on 70B) — still readable, but the Mac Mini M4 Pro is happier serving 7B–14B over the link. On an RTX rig, a 24GB card like a used 3090 runs a 7B model near 95 tok/s, which is instant for any reading; its limit is model size, not speed.

Which model sizes to actually use over the link

Match the model to the moment:

7B–8B (Qwen2.5 7B, Llama 3.1 8B, Gemma 4 12B): the default for phone use. Quick questions, code snippets, summaries. Streams faster than you read on any capable host.
14B–32B: the sweet spot when you want better reasoning and your host has the bandwidth. A Mac Studio or RTX rig handles these comfortably; check our VRAM tier guide for what fits.
70B and up: reach for these deliberately, not as a default. They’re the reason LM Link exists — your phone can’t run them, your home rig can — but accept a slower first token. Keep one loaded only when you know you want it.

For coding-specific work from your phone, a smaller instruct model handles quick edits, but for serious agentic coding you’ll want to be at the desk; our writeup of a local AI coding stack on the sister site covers that workflow.

Common problems and fixes

Devices don’t discover each other. Confirm both are signed into the same LM Studio account and that LM Link is toggled on in both LM Studio and Locally. The account identity is the only thing pairing them.

Connection works on Wi-Fi but not cellular. This is usually a direct-vs-relay path issue on a restrictive carrier NAT. It should still connect via relay; if it stalls, toggle LM Link off and on at the host to re-register, and give the tunnel a few seconds to establish.

Replies are slow / long pause before the first word. That’s host-side, not network. You’re either running a model too large for the host’s bandwidth or hitting time-to-first-token on a long context. Switch to a smaller loaded model, or lower the context length (build 2 defaults to 8k — fine for most phone chats).

“No model available” in Locally. The host has to have a model loaded or loadable through LM Link. Load one in LM Studio on the host first, then refresh in Locally.

You don’t own a capable host at all. LM Link only relays to your hardware — it isn’t a rental service. If you don’t have a Mac or GPU rig that can hold the model, renting a cloud GPU and running LM Studio’s server there is the alternative; an on-demand RunPod instance gets you a 24GB–80GB GPU by the hour without buying hardware. For the buy-vs-rent math, see our RunPod vs local GPU breakdown.

Privacy: what’s actually leaving your devices

Worth stating plainly because it’s the feature’s strongest selling point. With LM Link:

Prompts and responses travel only between your phone and your rig, WireGuard-encrypted.
Chat history stays on your devices — Locally stores chats on the phone, not in the cloud.
The only thing touching LM Studio’s servers is the device discovery list, so the two endpoints can find each other.

Compare that to running a model through any cloud chat app, where every prompt hits a third-party server. If you’ve gone to the trouble of building a local AI setup for privacy, LM Link is one of the few remote-access options that doesn’t quietly undo it. It is not, however, a substitute for understanding your own threat model — you’re still trusting LM Studio’s account system for device pairing.

Is it worth setting up?

For an existing LM Studio user with a capable home rig: yes, easily. It replaces the fiddly stack of dynamic DNS, port forwarding, reverse proxies, or SSH tunnels with a five-minute, zero-config, encrypted link — and it keeps both inference and chat history on hardware you own. The free Preview makes the cost of trying it zero.

The honest limitations: it’s iPhone/iPad-only at launch, both ends must run LM Studio, and it’s gated behind an LM Studio account, so it’s a convenience layer for the LM Studio ecosystem rather than a universal remote-inference server. If you live in Ollama or llama.cpp, this won’t pull you over by itself — see our Ollama vs LM Studio vs llama.cpp comparison for where each one wins. But if LM Studio is already your daily driver, LM Link turns the expensive rig in your closet into something you can actually reach from anywhere.

FAQ

Does my phone need a powerful chip to use LM Link? No. The phone is a thin client — it only displays streamed text. All model loading and token generation happen on the host Mac or PC. Any iPhone or iPad that runs Locally works.

Is LM Link free? It’s free during the Preview that opened with 0.4.16 build 2 (June 8, 2026). LM Studio has said free and paid tiers are planned at general availability, but pricing hasn’t been announced.

Does my data go through LM Studio’s or Tailscale’s servers? No for content. Prompts and responses are end-to-end encrypted between your devices via WireGuard. Only your device discovery list touches LM Studio’s servers, purely so your devices can find each other.

Do I need to install Tailscale separately? No. LM Studio embeds Tailscale’s userspace library (tsnet) inside the app. You don’t install or configure Tailscale, and it doesn’t change your device’s global network settings.

Can I use this on Android? Not yet. At launch, Locally is iPhone and iPad only. Android has not been announced.

What model size should I run for phone use? A 7B–14B model at Q4_K_M is the sweet spot — it streams faster than you read on any capable host. Use 70B only when you specifically want more depth and can accept a slower first token.

Will it work on cellular behind a strict carrier NAT? Usually yes. Tailscale’s NAT traversal handles CGNAT and most restrictive networks; if a direct connection can’t form, it falls back to an encrypted relay, which adds some latency but still connects.

Sources

Last updated June 16, 2026. Software versions, pricing, and benchmark numbers change; verify current details before relying on them.

Was this article helpful?