Does local AI autocomplete need an internet connection?

No. Once the model is downloaded, Shadowtype runs 100% on-device. There is no cloud inference call, so completions work on a plane, behind a firewall, or fully offline — and they keep working with no account and no subscription server to phone home to.

Which models does Shadowtype run locally?

Shadowtype ships a catalog of free GGUF models including Qwen3 Base (sharp fill-in-the-middle continuation), Gemma 3, and MoE / small-MoE variants — and you can drop in your own GGUF too. They run with llama.cpp accelerated by Metal on Apple Silicon. Lighter models give the fastest ghost text; larger ones give richer completions. You pick the trade-off.

Is local AI autocomplete private?

Yes. Because inference is on-device, your text is never transmitted for completion. Shadowtype ships with zero telemetry and no analytics backend, requires no account, and works offline — so private autocomplete is the default, not a setting you have to find.

Local AI autocomplete · macOS

Local AI autocomplete for Mac —
a real LLM, running on-device.

Shadowtype runs an actual language model on your Apple Silicon chip — not in the cloud. It reads what you’re typing and predicts the rest as inline ghost text in any app; Tab accepts a word, ⌥Tab a whole line. Inference happens locally with llama.cpp + Metal, so completions land in under ~150ms on an M2 and no keystroke is ever sent to a server.

Download Shadowtype free View source

100% on-device
No cloud inference
Model choice
Zero telemetry

What “local AI autocomplete” means

The model lives on your Mac

Cloud autocomplete streams your keystrokes to a server, runs the model there, and sends predictions back. Local AI autocomplete flips that: the model is downloaded once and every prediction is computed on your own silicon. Here’s the loop.

1

You type, on-device context is read

As you write in any text field, Shadowtype reads the surrounding text locally through macOS accessibility — never by uploading it anywhere.

2

llama.cpp + Metal runs the LLM

A GGUF model from the built-in catalog (Qwen3 Base, Gemma 3, MoE variants — or bring your own) runs through llama.cpp with Metal acceleration on your Apple Silicon GPU. No API call, no round-trip latency.

3

Ghost text appears at the caret

The prediction shows as dimmed inline ghost text right where you’re typing — typically in under ~150ms on an M2, fast enough to feel like part of the keyboard.

4

Tab to accept, keep typing to ignore

Tab accepts a word, ⌥Tab a whole line. Don’t like it? Just keep typing and it dissolves. You stay in control of every character.

Why on-device matters

Real local inference, real advantages

Local = fast

No network round-trip means no waiting on a server. Ghost text typically lands in under ~150ms on an M2 because the LLM runs right on your GPU via Metal. Lighter models go faster still.

Local = private

Because nothing is sent for completion, your words can’t leak. Shadowtype is private by design with zero telemetry and no account — the model simply never talks to the internet.

Local = no cloud bill

There’s no inference API metering tokens behind the scenes, so there’s no per-use cost and no subscription to fund a server. Shadowtype is free and open source.

Your model, your trade-off

Pick from a catalog of free GGUF models — Qwen3 Base, Gemma 3, MoE and small-MoE variants — or drop in your own. Choose a small model for instant ghost text or a larger one for richer completions. Swap any time.

Works in every app

Mail, Slack, Notes, your editor, the browser — continuous inline completion anywhere you can type, plus selection rewrite on ⌥⌘K when you want to reshape text you’ve already written.

Always offline-ready

Once the model is on disk, completion works offline — on a plane, behind a firewall, anywhere. There’s no server to be down and no account to expire.

Beyond the buzzword

Local that’s actually local

Plenty of apps say “AI” while quietly calling a cloud API. Shadowtype runs the whole model on your machine and personalizes to you — locally. You can add per-app instructions (a different voice for Mail than for Slack), and it adapts to your phrasing over time, all without sending a single byte off-device. If you’re weighing it against a hosted-completion tool, see how it compares or the full feature list.

Questions

Local AI autocomplete FAQ

What is local AI autocomplete?

It’s autocomplete powered by a real language model that runs entirely on your own Mac, predicting your next words inline as ghost text. Unlike cloud autocomplete, no keystrokes leave the device — inference happens on your Apple Silicon chip via llama.cpp and Metal, so it’s fast, private, and works offline.

Does it need internet?

No. After the model downloads once, Shadowtype runs 100% on-device with no cloud inference call. Completions work on a plane, behind a firewall, or fully offline — and there’s no account or subscription server to phone home to.

Which models does it run?

A catalog of free GGUF models you choose from — Qwen3 Base (sharp fill-in-the-middle), Gemma 3, MoE and small-MoE variants — all running through llama.cpp with Metal acceleration. You can also drop in your own GGUF via the BYOM picker. Lighter models give the fastest ghost text; larger ones give richer completions — pick the trade-off that fits your Mac.

Can my own scripts and editors use the same local model?

Yes. Shadowtype exposes an OpenAI-compatible HTTP endpoint on 127.0.0.1 with no key required, so your terminal, editor plugins, and agents can call /v1/chat/completions against the same on-device model. There’s also a built-in MCP server for Claude Code and other MCP clients. Everything is bound to localhost — nothing leaves your Mac.

Is it private?

Yes. Because inference is on-device, your text is never transmitted for completion. Shadowtype ships with zero telemetry and no analytics backend, needs no account, and works offline — so privacy is the default, not a setting to hunt for.

Ready when you are

Run the LLM on your Mac, not someone’s server.

Download Shadowtype free, accept your first word with Tab, and upgrade once — never monthly.

Download for macOS View source

100% on-device
No subscription
No account
Zero telemetry
Open source

Local AI autocomplete for Mac —a real LLM, running on-device.