How Syntax works
A high-level walkthrough of Syntax's three planes, the Bridge, and what happens to a request from your editor to a model and back.
Syntax is built around three planes that work together but stay clearly separated. You don't have to understand every layer to use Syntax — but the mental model below is enough to predict what will happen for any given configuration.
The three planes
| Plane | What it owns |
|---|---|
| Control | Identity, organization policy, secrets, budgets, audit logs. |
| Execution | Your sessions, the harness lifecycle, the local proxy, approvals, tool orchestration. |
| Inference | The model catalog, hardware detection, engine selection, autotuning, model lifecycle. |
The three planes are deliberately decoupled. The control plane never sees the content of your sessions; the execution plane never has to think about how a model is autotuned for your specific GPU; the inference plane never reaches into your editor.
First-class inter-compatibility
Every supported coding assistant talks to a single OpenAI- and
Anthropic-compatible endpoint on localhost. That endpoint is the Bridge
— the piece of Syntax that accepts requests in the format your harness already
speaks and routes them to the right backend.
What happens to a request
- Your harness sends a chat request to its configured endpoint, which is actually Syntax's local Bridge.
- The Bridge resolves the requested model name against your active model policy (alias resolution, tier overrides, budget caps).
- The Bridge picks a backend — local engine, remote self-hosted engine, dUX-managed remote, or a hosted provider — based on what's deployed and what your policy allows.
- The chosen backend serves the request. Local serving uses the most efficient engine for your hardware (see Multi-engine inference).
- Tokens stream back to your harness in the wire format it expects, so streaming, tool calls, reasoning, and multimodal content all render correctly.
What you control
Syntax exposes a small set of high-leverage knobs:
- Model policy — which models are allowed for which tiers, with aliases and per-deployment overrides.
- Routing strategy — Latency vs Throughput, Performance vs Economy on multi-host deploys, public vs private endpoint exposure.
- Approvals — what tool calls your harness is allowed to run without asking, and how risky operations get gated.
- Budgets — hard caps and soft warnings for tokens or compute, per user and per organization.
Where this plays out
- The Bridge — what it is and why every harness talks to it — is covered in Differentiators → First-class inter-compatibility.
- The catalog and inference engines are covered in Inference.
- The dUX-backed managed remote story is in Syntax × dUX.