Local AI autocomplete · macOS

Local AI autocomplete for Mac —
a real LLM, running on-device.

Shadowtype runs an actual language model on your Apple Silicon chip — not in the cloud. It reads what you’re typing and predicts the rest as inline ghost text in any app; Tab accepts a word, ⌥Tab a whole line. Inference happens locally with llama.cpp + Metal, so completions land in under ~150ms on an M2 and no keystroke is ever sent to a server.

  • 100% on-device
  • No cloud inference
  • Model choice
  • Zero telemetry
What “local AI autocomplete” means

The model lives on your Mac

Cloud autocomplete streams your keystrokes to a server, runs the model there, and sends predictions back. Local AI autocomplete flips that: the model is downloaded once and every prediction is computed on your own silicon. Here’s the loop.

1

You type, on-device context is read

As you write in any text field, Shadowtype reads the surrounding text locally through macOS accessibility — never by uploading it anywhere.

2

llama.cpp + Metal runs the LLM

A GGUF model (Gemma 4 or Qwen3.5, your pick) runs through llama.cpp with Metal acceleration on your Apple Silicon GPU. No API call, no round-trip latency.

3

Ghost text appears at the caret

The prediction shows as dimmed inline ghost text right where you’re typing — typically in under ~150ms on an M2, fast enough to feel like part of the keyboard.

4

Tab to accept, keep typing to ignore

Tab accepts a word, ⌥Tab a whole line. Don’t like it? Just keep typing and it dissolves. You stay in control of every character.

Why on-device matters

Real local inference, real advantages

Local = fast

No network round-trip means no waiting on a server. Ghost text typically lands in under ~150ms on an M2 because the LLM runs right on your GPU via Metal. Lighter models go faster still.

Local = private

Because nothing is sent for completion, your words can’t leak. Shadowtype is private by design with zero telemetry and no account — the model simply never talks to the internet.

Local = no cloud bill

There’s no inference API metering tokens behind the scenes, so there’s no per-use cost and no subscription to fund a server. You pay $79 once (Founders from $39) and own it.

Your model, your trade-off

Pick from a catalog of free GGUF models — Gemma 4 and Qwen3.5 variants. Choose a small model for instant ghost text or a larger one for richer completions. Swap any time.

Works in every app

Mail, Slack, Notes, your editor, the browser — continuous inline completion anywhere you can type, plus selection rewrite on ⌥⌘K when you want to reshape text you’ve already written.

Always offline-ready

Once the model is on disk, completion works offline — on a plane, behind a firewall, anywhere. There’s no server to be down and no account to expire.

Beyond the buzzword

Local that’s actually local

Plenty of apps say “AI” while quietly calling a cloud API. Shadowtype runs the whole model on your machine and personalizes to you — locally. You can add per-app instructions (a different voice for Mail than for Slack), and it adapts to your phrasing over time, all without sending a single byte off-device. If you’re weighing it against a hosted-completion tool, see how it compares or the full feature list.

Questions

Local AI autocomplete FAQ

What is local AI autocomplete?
It’s autocomplete powered by a real language model that runs entirely on your own Mac, predicting your next words inline as ghost text. Unlike cloud autocomplete, no keystrokes leave the device — inference happens on your Apple Silicon chip via llama.cpp and Metal, so it’s fast, private, and works offline.
Does it need internet?
No. After the model downloads once, Shadowtype runs 100% on-device with no cloud inference call. Completions work on a plane, behind a firewall, or fully offline — and there’s no account or subscription server to phone home to.
Which models does it run?
A catalog of free GGUF models you choose from, including Gemma 4 and Qwen3.5 variants, all running through llama.cpp with Metal acceleration. Lighter models give the fastest ghost text; larger ones give richer completions — pick the trade-off that fits your Mac.
Is it private?
Yes. Because inference is on-device, your text is never transmitted for completion. Shadowtype ships with zero telemetry and no analytics backend, needs no account, and works offline — so privacy is the default, not a setting to hunt for.
Ready when you are

Run the LLM on your Mac, not someone’s server.

Download Shadowtype free, accept your first word with Tab, and upgrade once — never monthly.

  • 100% on-device
  • No subscription
  • No account
  • Zero telemetry
  • 14-day refund