Syntax

How Syntax works

A high-level walkthrough of Syntax's three planes, the Bridge, and what happens to a request from your editor to a model and back.

Syntax is built around three planes that work together but stay clearly separated. You don't have to understand every layer to use Syntax — but the mental model below is enough to predict what will happen for any given configuration.

The three planes

PlaneWhat it owns
ControlIdentity, organization policy, secrets, budgets, audit logs.
ExecutionYour sessions, the harness lifecycle, the local proxy, approvals, tool orchestration.
InferenceThe model catalog, hardware detection, engine selection, autotuning, model lifecycle.

The three planes are deliberately decoupled. The control plane never sees the content of your sessions; the execution plane never has to think about how a model is autotuned for your specific GPU; the inference plane never reaches into your editor.

First-class inter-compatibility

Every supported coding assistant talks to a single OpenAI- and Anthropic-compatible endpoint on localhost. That endpoint is the Bridge — the piece of Syntax that accepts requests in the format your harness already speaks and routes them to the right backend.

What happens to a request

  1. Your harness sends a chat request to its configured endpoint, which is actually Syntax's local Bridge.
  2. The Bridge resolves the requested model name against your active model policy (alias resolution, tier overrides, budget caps).
  3. The Bridge picks a backend — local engine, remote self-hosted engine, dUX-managed remote, or a hosted provider — based on what's deployed and what your policy allows.
  4. The chosen backend serves the request. Local serving uses the most efficient engine for your hardware (see Multi-engine inference).
  5. Tokens stream back to your harness in the wire format it expects, so streaming, tool calls, reasoning, and multimodal content all render correctly.

What you control

Syntax exposes a small set of high-leverage knobs:

  • Model policy — which models are allowed for which tiers, with aliases and per-deployment overrides.
  • Routing strategy — Latency vs Throughput, Performance vs Economy on multi-host deploys, public vs private endpoint exposure.
  • Approvals — what tool calls your harness is allowed to run without asking, and how risky operations get gated.
  • Budgets — hard caps and soft warnings for tokens or compute, per user and per organization.

Where this plays out