Syntax Docs

How Syntax serves models — local, remote self-hosted, managed remote on dUX, and hosted providers.

Every model Syntax exposes ends up running somewhere. The four inference targets are:

Target	Where it runs	Best for
Local	Your machine — GPU, Apple Silicon, or CPU.	Solo workflows, privacy-first, no network dependency.
Remote self-hosted	A box you've provisioned (your server, your GPU, your SSH).	Power users with their own hardware.
Managed remote (dUX)	dUX-managed cloud GPU.	Teams that want managed infrastructure.
Hosted provider	OpenAI, Anthropic, Google, etc.	Frontier models, predictable cost, no infra.

All four are reachable through the same Bridge endpoint. Your harness doesn't know — or care — which one is serving any given request.

How Syntax decides what to run

Two layers make the decision:

Routing. When a request arrives at the Bridge, the active model policy picks which deployment serves it. If a model is deployed in multiple places (e.g., locally and on managed remote), routing picks based on your preferences.
Engine selection. For local and remote-self-hosted serving, Syntax's autotuner picks the most efficient serving engine for the chosen model and your hardware — see Differentiators → Multi-engine inference.

You can override either layer. Aliases let you pin a name to a specific deployment; per-deployment configuration lets you override engine choices when you need to.

Multi-model deployments

When you deploy a multi-model party — a Main Agent, a Default Sub-Agent, and up to six Specialists — the inference plane plans holistically:

All models in the party share the same target (local, self-managed remote, or managed remote — but not mixed).
The autotuner places each model on the available hardware in role order so the Main Agent gets the best resources.
VRAM pressure is relieved by tier when needed: specialists first, the sub-agent second, the Main Agent only as a last resort. Eligible smaller models can fall back to CPU automatically.

Targets in depth

Local inference — GPU / Apple Silicon / CPU on your own machine.
Remote self-hosted — your own SSH-reachable hardware.
Managed remote — dUX-backed cloud GPU.
Hardware support — what runs on what.
Multimodal capabilities — image, video, audio, 3D, time-series forecasting.

Inference overview

How Syntax decides what to run

Multi-model deployments

Targets in depth

On this page