# Syntax Documentation — Full Corpus

Source: https://docs.syntax-ftc.com/llms-full.txt — see https://docs.syntax-ftc.com/llms.txt for the shorter index.

---

# Frequently Asked Questions

> Quick answers to the questions that come up most often when evaluating or starting with Syntax.

Permalink: https://docs.syntax-ftc.com/docs/faq

## Is Syntax open source?

Syntax is a proprietary product — your Fully Managed, Privately Owned,
General-Purpose AI Factory, built on dUX. The Syntax platform itself is
free of additional charge; you pay only for the infrastructure it runs
on, via dUX.

## What platforms are supported?

macOS 12+, Linux (Ubuntu 20.04+ / Debian 11+ / Fedora 38+ / equivalent
glibc-based distros), and Windows 10 (21H2)+ / Windows 11. x86_64 and
ARM64 are both supported. See [Getting Started →
Install](/docs/getting-started/install-macos).

## Do I need a GPU?

No. Syntax runs without a GPU; smaller models serve on CPU and larger
requests can route to hosted providers. With a GPU (NVIDIA, AMD ROCm,
or Apple Silicon) you can run larger models locally.

## Which coding assistants does Syntax support?

The Syntax CLI, Codex, Claude Code, OpenCode, and Pi. You can use
multiple at once; they share the same active model policy. See
[Harnesses → Overview](/docs/harnesses/overview).

## Does Syntax send my code to a server?

Only if you choose to. Syntax runs on infrastructure you own — your
machine for local serving, your boxes for self-managed remote, your
dUX-managed environment for managed remote. Your sessions stay within
your boundary unless you've configured a hosted provider. The control
plane never sees your session content.

## How does Syntax make money?

The Syntax platform itself is free of additional charge. You pay for
the infrastructure it runs on, via dUX.

## Can I use my own provider keys?

Yes. OpenAI, Anthropic, Google, and other supported providers all work
with your own API keys. Syntax routes requests through the Bridge so
the keys are stored within your environment and never leave your
control.

## How does Syntax compare to running an OpenAI-compatible proxy myself?

A custom proxy gets you the routing primitive. Syntax adds:

- Hardware-aware multi-engine inference for local serving.
- Multi-model party deployments.
- Managed remote inference (with dUX).
- Per-harness `connect`/`disconnect` so you don't manually edit each
  tool's config.
- Plan Mode, Agent Handoff, Runtime Modes.
- Budgets, exposed-endpoint bearers, audit.

If a custom proxy gets you 70% of what you need, Syntax gets you 100%
without you maintaining it.

## Can I share a deployed model with another tool?

Yes. Issue a per-deployment exposed-endpoint bearer (`sk-syntax-…`) from
the desktop app. The bearer is scoped to a single deployment and can be
revoked at any time. The exposed URL is OpenAI- and Anthropic-compatible
so any tool that speaks either ecosystem can use it.

## Does Syntax work offline?

Local inference works offline by definition. Hosted-provider routing
needs network. Managed remote needs network to dUX. The desktop app
itself runs offline.

## Can Syntax run on on-prem, air-gapped infrastructure?

Absolutely. Syntax fully supports on-prem and air-gapped deployments.
You run a dUX server on your internal network, and it orchestrates
whatever compute Syntax needs from the hardware you've made available
behind your perimeter. You then choose which OSS models to import into
the internal catalog, and from there the internal dUX server manages
your private model and image registries — so every user on your private
network gets the full Syntax experience without ever leaving the walled
garden.

## How does Plan Mode differ from "just asking the agent to plan first"?

Plan Mode separates the planning context from the execution context.
The executor starts fresh and works from the approved plan, not from
the back-and-forth that produced it. That's structurally different from
"please plan first then execute" in the same conversation. See
[Concepts → Plan Mode](/docs/concepts/plan-mode).

## What's "Agent Handoff" for?

When a session fills the context window, Syntax writes a structured
snapshot and resumes in a fresh context. This avoids the drift that
in-place compaction causes on long sessions. See
[Concepts → Agent Handoff](/docs/concepts/agent-handoff).

## How are AI agents supposed to consume these docs?

Fetch [`/llms-full.txt`](/llms-full.txt) for the full corpus, or
[`/llms.txt`](/llms.txt) plus per-page `/api/mdx/<slug>` for targeted
retrieval. There's also a [JSON sitemap](/api/sitemap.json). See
[Differentiators → AI-agent-friendly](/docs/differentiators/ai-agent-friendly).

## I have more questions

The [Glossary](/docs/glossary) covers terminology. For anything that
isn't covered, [contact the Syntax team](https://www.syntax-ftc.com/#contact).

---

# Glossary

> A short, capability-level glossary of Syntax-specific terms.

Permalink: https://docs.syntax-ftc.com/docs/glossary

This glossary covers Syntax-specific terminology you'll see across the
wiki and the product UI. It deliberately stays at capability level —
public, user-visible names — so it's safe to share with anyone
evaluating Syntax.

## Agent Handoff

A structured checkpoint Syntax writes when a session approaches the
context-window limit. A fresh agent picks up from the snapshot, so the
session continues cleanly instead of in-place compacting. See
[Concepts → Agent Handoff](/docs/concepts/agent-handoff).

## Bridge

The local OpenAI- and Anthropic-compatible endpoint every Syntax
integration talks to. See
[Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility).

## Catalog

The set of models Syntax knows how to deploy. Includes hundreds of
open-weight and provider-hosted models across many model purposes.

## Default Sub-Agent

The cheaper model in a multi-model party that the main agent
delegates routine work to.

## dUX

The cloud-GPU orchestrator Syntax integrates with for managed remote
inference. Syntax submits deployment intents; dUX handles GPU
placement, scaling, drivers, and ingress inside your cloud accounts —
you remain the sole admin of the underlying machines.

## Exposed endpoint

A per-deployment OpenAI-compatible URL plus a bearer token (`sk-syntax-…`)
that lets external clients reach a single deployed model. Issued and
revoked from the desktop app.

## Harness

A coding assistant that talks to an LLM. The Syntax CLI, Codex,
Claude Code, OpenCode, and Pi are the five supported harnesses. See
[Harnesses → Overview](/docs/harnesses/overview).

## Local inference

Running a model on your own machine — GPU, Apple Silicon, or CPU.

## Main Agent

The model your harness primarily talks to in a multi-model party. The
main agent can call specialists as tools.

## Managed Remote

dUX-backed cloud GPU deployment.

## Modality

The kinds of input/output a model handles. Common modalities include
text, image, video, and audio.

## Model Purpose

A coarse classification of what a model is for. Examples: text
generation, embedding, OCR, image processing, video processing,
image generation, video generation, segmentation, TTS, audio
generation, mesh recovery, UI grounding, audio transcription,
speech-to-speech, time-series forecasting.

## Party

A deployment with a Main Agent, a Default Sub-Agent, and up to six
Specialists composed together. See
[Concepts → Party Builder](/docs/concepts/party-builder).

## Plan Mode

A two-phase agent workflow: the agent proposes a structured plan, you
accept, and a fresh execution context carries it out. See
[Concepts → Plan Mode](/docs/concepts/plan-mode).

## Preset

A schema-versioned ready-to-deploy party definition. Lets a team share
a common multi-model configuration in one click.

## Remote Self-Hosted

A deployment target where Syntax controls a remote box you've provided
(your own server, your own GPU, your own SSH).

## Runtime Mode

The three-state cycle (Default / AutoEdit / Bypass) that controls how
cautious Syntax is about running tool calls without asking. See
[Concepts → Runtime Modes](/docs/concepts/runtime-modes).

The three states are:

- **Default** — asks before running anything that touches your
  filesystem, runs a shell command, or makes a network call. Reads are
  usually unattended.
- **AutoEdit** — auto-approves common edits and routine commands; only
  asks for genuinely risky operations.
- **Bypass** — approves everything without asking. Entering Bypass
  requires an explicit Y/N confirmation; it is not the default mode you
  cycle into accidentally.

## Specialist

A model in a party, beyond the main agent and the sub-agent, that
provides a specific capability and is exposed to the main agent as a
tool.

## Strategy

On multi-host deployments, the choice between **Performance** (one
model per host) and **Economy** (pack onto the fewest hosts).

## Tier (deployment)

For managed remote, **Latency** vs **Throughput** — two different
optimization profiles dUX uses when placing your deployment.

## syntax connect / syntax disconnect

CLI commands (and equivalent UI buttons) that wire a coding assistant to
the local Bridge — and unwire it cleanly.

---

# Welcome to Syntax

> Syntax is your Fully Managed, Privately Owned, General-Purpose AI Factory.

Permalink: https://docs.syntax-ftc.com/docs

Syntax is your Fully Managed, Privately Owned, General-Purpose AI Factory.

Syntax serves as a universal backend for AI that allows you to immediately
deploy both proprietary and open-source models. Through Syntax, you can
automatically deploy open models to your existing cloud instances, or let
Syntax auto-provision the necessary infrastructure. This auto-provisioning
takes place on a unified, secure, and private managed account fully owned by
you, spanning dozens of public cloud providers via dUX, our fully-featured
compute resources orchestrator.

Take full control over which providers and regions to use. You can implement
custom per-user budget controls and IAM rules, integrate your own
secrets-management tool to ensure privacy, and monitor your deployments and
costs across multiple providers from a single control plane.

The platform supports a rapidly expanding catalog, includes frontier-class and
lightweight LLMs, alongside models for Visual Understanding, Image/Video/Audio
Generation, Search & Embeddings, Time-Series, 3D, Safety & Guardrails, and
much more.

Syntax features its own Codex-based `syntax-cli` coding harness, which
supports a unique "Models Party" configuration. Additionally, it seamlessly
connects and deploys OSS models for use with third-party environments like
OpenAI's Codex, Claude Code, Pi, and OpenCode.

Choose any model from the catalog, including cutting-edge OSS models, deploy
it with a single click, connect your preferred coding harness via a simple CLI
command, and start building.

You can also sign in with your OpenAI Plus/Pro subscription to deploy OpenAI
models alongside your OSS stack, ensuring optimal token utilization across
simple, everyday, and highly complex tasks.

## Security and Isolation

Neither Syntax nor dUX will ever access your machines, data, logs, files, or
code. Furthermore, by integrating dUX with your private secrets manager, you
establish a technical guarantee that we cannot access any of your resources
under any circumstances. Additionally, enterprise customers are automatically
deployed as fully isolated tenants, ensuring complete infrastructure
separation from all other dUX or Syntax clients.

## Documentation Overview

- **Introduction** — What Syntax is and how it works.
- **Getting Started** — Installation guides for macOS, Linux, and Windows;
  first launch instructions; and connecting your first harness.
- **Harnesses** — Supported coding assistants and how to point them toward
  Syntax.
- **Inference** — Local (GPU/CPU/Apple Silicon), remote self-hosted, and
  managed remote inference via dUX.
- **Models** — Supported model categories, modalities, reasoning capabilities,
  tool use, and licensing.
- **Concepts** — Party Builder, Plan Mode, Agent Handoff, Runtime Modes,
  Memory, Exposed Endpoints, and Observability.
- **Syntax × dUX** — How Syntax integrates with dUX for managed remote
  inference.
- **CLI Reference** — The two binaries — `syntax` (the application
  command) and `syntax-cli` (the bundled coding harness) — and their
  user-visible flags.
- **Differentiators** — Why engineering teams choose Syntax.

## For AI agents

Every page in this wiki is also available as raw Markdown:

- The full corpus is available at [`/llms-full.txt`](/llms-full.txt).
- A short index is available at [`/llms.txt`](/llms.txt).
- Each page exposes a Markdown source via `/api/mdx/<slug>`.
- A JSON sitemap of all pages is available at
  [`/api/sitemap.json`](/api/sitemap.json).

---

# User-visible flags reference

> The flags you'll actually use day-to-day across `syntax` and `syntax-cli`.

Permalink: https://docs.syntax-ftc.com/docs/cli/flags

This page lists the user-visible flags you'll see across the two
Syntax binaries:

- **`syntax`** — the application command (`connect`, `disconnect`,
  `deploy`, `doctor`, `models`, `sessions`, `memory`).
- **`syntax-cli`** — the bundled coding harness (interactive TUI and
  the `exec` subcommand for headless runs).

For the authoritative flag list, run `<command> --help` directly —
that output reflects the actual flags shipped in your installed
version.

Placeholders use `ALL_CAPS` to avoid conflicts with command-line angle
brackets in tables — e.g., `--model ID` means "supply a model ID
where `ID` is shown".

## Global flags

These work on both `syntax` and `syntax-cli`:

| Flag | Meaning |
|---|---|
| `--version` | Print the running Syntax version. |
| `--help`, `-h` | Subcommand-specific help. |
| `--quiet` | Suppress non-essential output. |
| `--verbose` | Extra output (without enabling tracing). |
| `--no-color` | Disable color in terminal output. |

## `syntax-cli` (interactive harness)

| Flag | Meaning |
|---|---|
| `--resume` | Resume a recent session. |
| `--model ID` | Override the default model for this session. |
| `--mode MODE` | Start in a specific Runtime Mode (`default`, `autoedit`, or `bypass`). Bypass still requires interactive confirmation. |
| `--plan` | Start in Plan Mode. |
| `--no-color` | Disable color. |

## `syntax-cli exec` (headless)

| Flag | Meaning |
|---|---|
| `--input PATH` | Read the task from a file instead of the command line. |
| `--output PATH` | Write structured output to a file. |
| `--model ID` | Override the default model. |
| `--max-turns N` | Cap the number of agent turns. |
| `--policy PATH` | Load an approval policy from a file. |
| `--json` | Emit structured JSON output (for piping). |

## `syntax connect`

| Flag | Meaning |
|---|---|
| `list` | List currently connected harnesses. |
| `--dry-run` | Print what would change without modifying anything. |

## `syntax deploy`

| Flag | Meaning |
|---|---|
| `--target TARGET` | Deployment target: `local`, `self-managed-remote`, or `managed-remote`. |
| `--tier TIER` | Deployment tier: `performance` or `cost-optimized`. |
| `--expose-private` | Issue a private exposed bearer. |
| `--expose-public` | Issue a public exposed bearer. |
| `--profile NAME` | Use a saved party profile. |

## Where to go next

- [Syntax Coding Harness](/docs/cli/syntax-cli)
- [`syntax connect`](/docs/cli/syntax-connect)
- [CLI overview](/docs/cli/overview)

---

# CLI overview

> The two Syntax binaries — `syntax` (the application command) and `syntax-cli` (the bundled coding harness) — and what each one is for.

Permalink: https://docs.syntax-ftc.com/docs/cli/overview

Syntax ships with two distinct binaries:

- **`syntax`** is the umbrella command for the Syntax application
  itself — connecting and disconnecting harnesses, deploying models,
  managing the catalog, inspecting sessions, running diagnostics.
- **`syntax-cli`** is the bundled coding harness — the interactive TUI
  agent. It's a sibling of Codex, Claude Code, OpenCode, and Pi from a
  conceptual standpoint, but it ships with Syntax and is always
  available without a `syntax connect` step.

Anything you can do in the desktop app for application-level
operations, you can do via `syntax`. The interactive coding experience
lives in `syntax-cli` (and in the desktop app, which shares the same
agent core).

## `syntax` commands at a glance

| Command | Purpose |
|---|---|
| `syntax connect <agent>` | Wire a coding assistant to the Bridge. |
| `syntax disconnect <agent>` | Restore a coding assistant's original config. |
| `syntax doctor` | Self-check: hardware, deps, network, deployments. |
| `syntax deploy` | Deploy a model or party from the CLI. |
| `syntax models` | Browse / search the catalog. |
| `syntax sessions` | List / inspect / resume past sessions. |
| `syntax memory` | Inspect or edit Layer 1 / Layer 2 memory. |
| `syntax --version` | Print the running Syntax version. |
| `syntax --help` | Top-level help. |

Subcommand-specific help is available on every command:

```bash
syntax connect --help
syntax deploy --help
```

## `syntax-cli` commands at a glance

| Command | Purpose |
|---|---|
| `syntax-cli` | Start an interactive coding session (TUI). |
| `syntax-cli --resume` | Resume a recent session. |
| `syntax-cli exec` | Run a one-shot agent task headlessly (CI / scripting). |
| `syntax-cli --help` | Harness-specific help. |

## When to use the CLI

- **Interactive coding in a terminal.** Run `syntax-cli` to start a TUI
  session.
- **Scripted automation.** `syntax-cli exec` runs a single task
  headlessly with structured I/O — fits naturally into CI pipelines,
  cron jobs, and pre-commit hooks.
- **Setup / teardown.** `syntax connect`, `syntax disconnect`,
  `syntax doctor`, `syntax deploy` are one-liners that don't need a
  GUI.
- **Remote shells.** When you're SSH'd into a box and just need a
  coding agent there.

## When the desktop app is better

- Browsing the catalog visually and composing parties.
- Watching multi-deployment fleets at a glance.
- Issuing / revoking exposed-endpoint bearers (the bearer is shown
  exactly once, and copying from a popup is more reliable than
  copying from a terminal).
- Configuring hardware / providers / managed remote targets.

## Where to go next

- [`syntax-cli`](/docs/cli/syntax-cli) — the interactive coding harness.
- [`syntax connect`](/docs/cli/syntax-connect) — harness wiring.
- [Flags](/docs/cli/flags) — the user-visible flags reference.

---

# Syntax Coding Harness

> The Syntax coding harness — TUI sessions with Plan Mode, Runtime Modes, and the full agent experience, started via `syntax-cli`.

Permalink: https://docs.syntax-ftc.com/docs/cli/syntax-cli

The Syntax coding harness is launched with `syntax-cli`. It's a
distinct binary from the top-level `syntax` command (which is the
umbrella for the Syntax application itself — connect, deploy, doctor,
models, sessions, memory, and so on). `syntax-cli` ships bundled with
Syntax; no separate install or `syntax connect` step is required.

## Starting a session

```bash
syntax-cli
```

Starts a fresh interactive session in the current working directory.
The agent sees the directory's contents (subject to your Runtime Mode
and any ignore patterns).

## Resuming a session

```bash
syntax-cli --resume
```

Lists recent sessions and lets you pick one to resume. Resumed
sessions inherit their previous state, including any Plan Mode plan,
the working tree at handoff, and Layer-2 memory.

## Working with files

The harness has the full set of built-in tools — file editing, shell
execution, web search, MCP integrations, the Skills framework, and any
specialists deployed in the active party. Tool calls are gated by your
active Runtime Mode (see
[Concepts → Runtime Modes](/docs/concepts/runtime-modes)).

## Keyboard

A few terminal-specific keybindings:

| Key | Effect |
|---|---|
| `Ctrl+M` | Cycle Runtime Mode (Default → AutoEdit → Bypass → Default). |
| `Esc` | Cancel the current turn — works pre-turn, mid-stream, and over popups. |
| `Ctrl+C` | Soft-quit the session. |
| `Ctrl+D` | Hard-quit. |

## Plan Mode in the harness

Plan Mode is a first-class harness experience. Toggle Plan Mode for
the current session and the agent enters the plan-then-execute split
described in [Concepts → Plan Mode](/docs/concepts/plan-mode).
Approved plans persist to disk so you can re-execute them later.

## Headless / scripted: `syntax-cli exec`

For non-interactive use:

```bash
syntax-cli exec "fix the failing test in tests/foo.py and add a regression test"
```

`exec` runs a single task without a TUI and exits. Approvals are
governed by the active policy rather than per-call confirmation;
output is structured for piping into other tools.

## Where to go next

- [`syntax connect`](/docs/cli/syntax-connect)
- [Flags](/docs/cli/flags)
- [Harnesses → Syntax CLI](/docs/harnesses/syntax-cli)

---

# Syntax connect

> Wire a coding assistant to the local Bridge. Reversible.

Permalink: https://docs.syntax-ftc.com/docs/cli/syntax-connect

`syntax connect <agent>` edits the named harness's own configuration
to point at the local Bridge, and records the change so it can be
undone.

## Usage

```bash
syntax connect <agent>
```

Where `<agent>` is one of:

- `codex`
- `claude-code`
- `opencode`
- `pi`

The Syntax CLI is bundled with Syntax and doesn't take a `connect`
step — it's available the moment Syntax is installed.

## What happens

1. Detects whether the named harness is installed.
2. Locates its configuration file in the standard location for your
   OS.
3. Backs up the configuration in a Syntax-managed ledger.
4. Edits the configuration to point at the local Bridge.
5. Applies any harness-specific normalizations.
6. Records the change.

## Disconnecting

```bash
syntax disconnect <agent>
```

Restores the harness's original configuration from the ledger and
removes the ledger entry. If the harness has been removed since
connection, `disconnect` cleans up gracefully.

## Listing connections

```bash
syntax connect list
```

Shows every harness that's currently connected.

## Multiple connections

You can connect multiple harnesses simultaneously. They share the
same Bridge, the same active model policy, and the same approvals.

## Detection failures

If the named harness isn't installed, `syntax connect` prints the
upstream install instructions and exits without making changes. No
ledger entry is created, so a subsequent install + connect works
cleanly.

## Where to go next

- [Harnesses overview](/docs/harnesses/overview)
- [Connecting a harness](/docs/getting-started/connecting-a-harness)

---

# Agent Handoff

> When the context window fills up, Syntax writes a structured handoff and starts fresh — instead of in-place compaction that loses the thread.

Permalink: https://docs.syntax-ftc.com/docs/concepts/agent-handoff

Long sessions inevitably fill the context window. Most agents handle
this with **compaction**: throwing away the older parts of the
conversation, sometimes summarizing them, sometimes just truncating. The
result is that the agent loses track of the original goal, forgets
decisions you discussed early on, and starts to drift.

Syntax takes a different approach. When the context window approaches
its limit (around 80–90% utilization), Syntax does an **Agent Handoff**:

1. The current agent writes a structured, schema-conformant snapshot of
   the work so far — the goal, the decisions, the files touched, the
   open questions, the next steps.
2. The snapshot is saved to durable storage in your home directory.
3. A fresh agent starts with an empty context primed only by a "resume"
   instruction that points at the snapshot.

The new agent reads the snapshot, picks up where the previous one left
off, and keeps going. The user-visible effect is "the conversation feels
like it just keeps going indefinitely". The technical effect is "every
turn always has a clean context window".

## Why this is better than compaction

Compaction is lossy by construction. Either you summarize (and the
summary is wrong in subtle ways) or you truncate (and the agent forgets
what mattered). Handoff is an explicit checkpoint: the schema captures
exactly what's needed to resume, and the new agent doesn't carry any of
the old turn-by-turn churn.

## What's in a handoff

The schema is intentionally lean. Roughly:

- The original goal and any clarifying answers.
- A summary of what's been done so far, organized by step.
- The list of files that have been changed, with intent (e.g., "edited
  but not yet tested").
- Open questions that need answers before the next step.
- The next step.

## What happens to the old conversation

It stays in your session history. If you want to scroll back, search
it, or fork from a particular point, it's there. The handoff doesn't
delete anything — it's a resume primitive, not an archive primitive.

## Compaction is still the fallback

The classic in-place compaction (`/compact` in supported harnesses) is
still available if you want it for a specific session. Handoff is the
default at the long-context threshold, but you can always force the
older behavior.

## Related concepts

- [Plan Mode](/docs/concepts/plan-mode) — the front-end version of the
  same idea: separate planning context from execution context.
- [Runtime Modes](/docs/concepts/runtime-modes) — how individual tool
  calls are gated within a turn, regardless of context length.

---

# Exposed endpoints

> Per-deployment OpenAI-compatible bearer tokens you can issue and revoke from the desktop app.

Permalink: https://docs.syntax-ftc.com/docs/concepts/exposed-endpoints

By default, the local Bridge is reachable only from `localhost`.
That's the safe default — your harness on the same machine can
reach it; nothing else can. But sometimes you want to share a
single deployed model with another tool or a teammate. That's what
**exposed endpoints** are for.

## What an exposed endpoint is

An exposed endpoint is a deployment-scoped, OpenAI-compatible URL
plus a bearer token. The bearer:

- Starts with `sk-syntax-` followed by a random suffix.
- Is shown to you exactly once when you issue it. After that, it's
  not retrievable from Syntax — only the masked form is shown.
- Is scoped to a single deployment. Other models on the same
  Syntax install are not reachable through this bearer.
- Can be revoked at any time. Revocation is immediate.

## Two flavors of exposure

When you deploy a model, you can pick:

- **Expose private** — the endpoint is reachable from your other
  Syntax tools (e.g., a teammate's machine on the same internal
  network) but not from the public internet.
- **Expose public** — the endpoint is reachable from anywhere with
  the bearer. Use this when you want to share with a non-Syntax
  tool that can't reach your private network.

You can pick either, both, or neither. Most one-off sharing flows
just use a private exposure; public exposure is for cases where
you genuinely need a publicly reachable URL.

## What an exposed bearer can do

A bearer can call the deployment's OpenAI- and Anthropic-compatible
inference surface (chat, messages, and model listing for the scoped
model — whichever apply to the model's modality).

It **cannot**:

- Call any other deployment.
- Issue or revoke other bearers.
- Modify deployments.
- Reach Syntax's settings or control plane.

## Issuing and revoking

From the desktop app's **Active Deployments** view:

- Click **Expose** on a deployment to issue a bearer. The bearer is
  shown once with a "Copy" button. After you close the modal,
  you'll only see the masked form.
- Click **Revoke** to invalidate the bearer immediately. New
  requests fail; in-flight requests complete normally.

## When this matters

- **Sharing with a non-Syntax tool.** Any tool that speaks an
  OpenAI-compatible API can use the exposed bearer.
- **Multi-machine workflows.** Run Syntax on one machine, run a
  bot or a backend on another, and let the backend reach a
  deployed model through a private exposure.
- **Stable URLs for a project.** Issue one bearer for a project,
  share it with the project's team, revoke when the project ends.

## Where to go next

- [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility) —
  the Bridge behind the URL.
- [Inference → Managed remote](/docs/inference/managed-remote) —
  managed deployments support both private and public exposures.

---

# Memory

> Two-layer memory — always-on file-based memory plus an opt-in retrieval pipeline for session-spanning recall.

Permalink: https://docs.syntax-ftc.com/docs/concepts/memory

Long-running coding work needs memory that survives across sessions.
Syntax's memory system has two layers, designed so the always-on
layer is simple and predictable while the optional second layer adds
sophisticated retrieval when you want it.

## Layer 1 — always on

A canonical memory file in your project (and a per-user one) is
loaded into every agent context automatically. Layer 1 memory is:

- **File-based.** It's a normal Markdown file you can read, edit, and
  diff like any other file.
- **Opt-in by content.** You write what you want remembered; the
  agent doesn't auto-write here.
- **Always loaded.** Every agent turn sees it, so guidance you write
  there is enforced consistently.

This is the right layer for "always-true" facts about your project,
your preferences, or your conventions — things you want every agent
turn to know without having to repeat.

## Layer 2 — opt-in retrieval

Layer 2 is a richer system that lets the agent record discrete
memory entries and retrieve them on demand:

- **Hybrid retrieval.** Combines lexical and semantic search over a
  per-user memory store.
- **Per-turn auto-retrieval.** When enabled, every agent turn
  automatically pulls in memories relevant to the current input
  before producing a response.
- **`memory_search` tool.** The agent can also explicitly search
  memory mid-turn.
- **Schema-stable storage.** The on-disk shape is durable, so memory
  built up over many sessions stays intact across upgrades.

Layer 2 is the right layer for "things the agent learned" — facts
about your codebase, decisions made in earlier sessions, lessons
learned from failed approaches.

## When to use which

- **Layer 1** for: explicit guidance, conventions, "always-do" /
  "never-do" rules, project context that doesn't change much.
- **Layer 2** for: incidental learnings, debugging notes, history of
  decisions, anything you'd otherwise forget.

You can use both at once. Most teams start with Layer 1 alone and
adopt Layer 2 once they have multi-session workflows where recall
becomes valuable.

## Where memory lives

Memory storage lives in your home directory under predictable paths
that you can inspect and back up. Layer 1 is plain Markdown; Layer 2
is a small structured store you don't have to read by hand.

## Memory and Agent Handoff

When a session crosses the long-context threshold and Syntax does an
[Agent Handoff](/docs/concepts/agent-handoff), the memory system is
unaffected. Layer 1 is loaded fresh for the new agent; Layer 2's
retrieval works the same way. Memory is the right answer to "this
information should survive across handoffs and across sessions";
handoff is the right answer to "this conversation should keep going
indefinitely".

## Where to go next

- [Concepts → Agent Handoff](/docs/concepts/agent-handoff)
- [Concepts → Plan Mode](/docs/concepts/plan-mode)

---

# Observability

> Metrics, traces, crash dumps, and request logging for everything Syntax runs — controllable from the UI.

Permalink: https://docs.syntax-ftc.com/docs/concepts/observability

Syntax exposes a small but complete observability surface so you
can see what's happening at runtime without digging through log
files. The same controls work for local inference, self-hosted
remote, and managed remote deployments.

## What's instrumented

For every deployed model and every request through the Bridge,
Syntax collects:

- **Metrics.** Tokens in / out, latency, request count, error
  rate, deployment health.
- **Traces.** Per-request traces showing each step of the routing
  pipeline.
- **Crash dumps.** When an inference engine or a tool call
  crashes, Syntax keeps a dump on disk so you can investigate.
- **Request logs.** Optional structured logs of every chat
  request.

Each is independently configurable.

## Where you control it

The desktop app's **Observability** page lets you:

- **Toggle metrics, tracing, crash dumps, and request logging**
  per deployment and globally.
- Pick a tracing destination — local file, your own OpenTelemetry
  collector, or none.
- Pick a metrics destination — local Prometheus-compatible
  endpoint, your own Prometheus scraper, or none.
- Set request-log retention.

The same controls work for local, self-hosted remote, and
dUX-managed remote deployments. Managed-remote observability data
flows back through dUX into Syntax so you don't need a separate
observability stack on the cloud side.

## What's surfaced in the UI

The desktop app also surfaces a small set of high-leverage signals
inline so you don't have to look at metrics dashboards:

- **Active Deployments** shows live token / request counters per
  deployment.
- **Sessions** shows per-session usage and budget consumption.
- **Status** badges indicate when a deployment is degraded.
- A consolidated **Issues** panel groups warnings ("a deployment is
  out of memory", "a remote target stopped responding") so they're
  visible without you opening a log file.

## Privacy

Observability is local by default. Metrics, traces, and request
logs all stay on your machine unless you explicitly point them at
an external collector. Org admins can require that observability
data flows through the org's collector for audit purposes.

## When to enable each

- **Metrics**: always. Cheap, useful for capacity planning.
- **Tracing**: when you're debugging a routing issue or a slow
  request.
- **Crash dumps**: leave on; they don't cost anything until
  something crashes.
- **Request logging**: enable when you need to debug a specific
  flow or when org policy requires it. Otherwise leave off — it
  generates the most data.

## Where to go next

- [Inference → Overview](/docs/inference/overview) — what each
  inference target reports.

---

# Party Builder & Specialists

> Compose a strong main agent, a cheaper sub-agent, and up to six specialists into a single deployment.

Permalink: https://docs.syntax-ftc.com/docs/concepts/party-builder

Real coding workflows rarely fit a single model. You want a strong main
agent for hard problems, a cheap and fast sub-agent for everything else,
and sometimes one or more specialists for things like image
understanding, OCR, image generation, time-series forecasting, or other
non-text tasks.

The **Party Builder** is the UI and runtime that lets you compose those
together as a single deployment.

## The shape of a party

A party has up to eight slots:

| Role | Count | Purpose |
|---|---|---|
| **Main Agent** | 1 (required) | The model the harness primarily talks to. |
| **Default Sub-Agent** | 1 (required — can re-use main agent's model) | The cheaper model the main agent delegates routine work to. |
| **Specialist** | up to 6 (optional) | A model with a specific capability, exposed as a tool the main agent can call. |

Specialists can be any model in the catalog. Each gets an optional custom
instruction that the main agent sees when deciding whether to call it.

## How specialists are called

When you deploy a party, every specialist is registered with the main
agent as a tool, along with a structured description the main agent
can use to decide when to invoke it. The agent calls the tool; the
call is forwarded to the specialist; the specialist's response is
folded back into the conversation.

## Presets

The Party Builder ships with a **Presets** tab — schema-versioned
ready-to-deploy party definitions you can pick instead of composing one
yourself. Presets are useful for common workflows ("a coding party",
"a vision-and-coding party", "a document-processing party") and for
sharing standard configurations across a team.

## Capability scoring and plan generation

Before you deploy, the Party Builder shows:

- **Coverage**: which capabilities the chosen models cover (text,
  reasoning, image understanding, image generation, audio, etc.) and
  where there are gaps.
- **Strength**: a per-model strength bar so you can see which model is
  carrying which capability.
- **A deployment plan**: the expected hardware footprint of the party
  on local GPU, on managed remote, or on self-hosted remote.

For local and self-hosted deployments, the plan is computed by the
inference plane's autotuning logic, which knows how each model fits on
your hardware and where to put it relative to the others. For managed
remote, the plan is sent to dUX, which returns the placement.

## Where to deploy

A party deploys to any of the same targets as a single model:

- **Local** — one or more models on your own machine.
- **Self-Managed Remote** — your own SSH-reachable GPU box(es).
- **Managed Remote** — dUX-backed cloud.

The deployment process is the same in each case; only the underlying
hardware changes.

## When to build a custom party vs use a preset

- **Use a preset** if your workflow lines up with a common template.
- **Build a custom party** if you have a specific main model you trust,
  cheaper specialists you want to lean on for routine work, and
  capability requirements that aren't covered by presets.

## Where this connects

- The main and sub-agent slots are two of the three reasons for
  Multi-model parties — see
  [Differentiators → Multi-model parties](/docs/differentiators/multi-model-parties).
- The deployment targets are described in
  [Inference → Overview](/docs/inference/overview).
- The capability scoring system reuses the
  [Models → Purposes](/docs/models/purposes) taxonomy.

---

# Plan Mode

> Structured planning before execution — the agent proposes, you accept, then a fresh context fork actually runs the work.

Permalink: https://docs.syntax-ftc.com/docs/concepts/plan-mode

Plan Mode is a deliberate pause between "the agent has heard your
request" and "the agent starts changing things". It splits work into two
phases:

1. **Plan**: the agent proposes a structured plan you can read, refine,
   and approve.
2. **Execute**: a fresh context picks up the approved plan and carries it
   out, with the plan as durable state.

The split is intentional. Long-running execution agents tend to
accumulate scratch context that has nothing to do with the actual task;
when something goes wrong on step 12, none of that scratch helps. Plan
Mode keeps the planning context separate from the execution context so
the executor starts with exactly the relevant input and nothing else.

## What you see in Plan Mode

In Plan Mode, the agent:

- Asks clarifying questions if your request is ambiguous, and saves the
  answers.
- Reads the relevant code, docs, or external references.
- Produces a plan with: goal, the files it intends to change, the
  sequence of steps, and verification criteria.
- Stops. It does not start editing.

You can:

- Accept the plan as-is.
- Send corrections (e.g., "split step 3 into two", "skip step 5", "add a
  rollback note for step 7").
- Reject and re-plan.

## What happens after acceptance

When you accept a plan, Syntax forks a fresh execution context primed
with the plan as the canonical input. That executor doesn't see the back-
and-forth from the planning phase — it sees only the final approved
plan plus whatever runtime context it builds itself. This is one reason
plans tend to execute cleanly even when the planning conversation was
messy.

## Why this matters

- **Reviewability.** A plan is something you can scroll, share, paste in
  a PR description, or hand off to a teammate. A 10,000-token chat log
  isn't.
- **Determinism.** Two executors given the same plan should produce
  similar outputs. If yours don't, that's a signal worth investigating.
- **Failure containment.** If the executor goes off the rails on step
  9, it's much easier to resume from step 9 (with the original plan)
  than to reconstruct the intent from a tangled conversation.

## When Plan Mode isn't the right answer

Trivial one-shot tasks ("rename this variable", "format this file")
don't need a plan. Plan Mode is most valuable for multi-file changes,
investigations that span several systems, or anything you'd want a
written reference for after the fact.

## How to invoke

In supported harnesses, Plan Mode is a setting or a slash-command. The
desktop app exposes it as a toggle on the session view. Syntax CLI
accepts it as a flag.

## Related concepts

- [Agent Handoff](/docs/concepts/agent-handoff) — what happens when a
  long execution context fills up.
- [Runtime Modes](/docs/concepts/runtime-modes) — the gating layer that
  decides which tool calls run unattended.

---

# Runtime Modes

> Default, AutoEdit, and Bypass — the three-state cycle that gates which tool calls run unattended.

Permalink: https://docs.syntax-ftc.com/docs/concepts/runtime-modes

Runtime Modes control how cautious Syntax is about running tool calls
without asking you first. The cycle has three states:

| Mode | Behavior |
|---|---|
| **Default** | Asks before running anything that touches your filesystem, runs a shell command, or calls out to the network. Reads are usually unattended. |
| **AutoEdit** | Auto-approves common edits and routine commands; asks only for genuinely risky operations (e.g., destructive shell, network egress to non-allowlisted hosts). |
| **Bypass** | Approves everything without asking. **Requires an explicit Y/N confirmation to enter.** |

You cycle through the three states with `Ctrl+M` in the TUI (or the
equivalent toggle in the desktop app). The current mode shows in the
status bar as a small badge.

## Why three modes

Most coding sessions sit comfortably in **AutoEdit**. The cost of
approving every diff hunk is real, and the risk of an unwanted edit is
small for most operations. AutoEdit removes the friction without giving
up the gates that matter.

**Default** is the right place to be when you're working on something
unfamiliar, when you're sharing your screen, or when you want a strict
human-in-the-loop posture.

**Bypass** is the right place to be when you've validated the agent's
plan, you trust the approvals fence to be set elsewhere (e.g., a
sandbox), and you want maximum velocity. The explicit confirmation gate
exists because Bypass is genuinely riskier — it is *not* the default
posture and it is not the default mode you cycle into accidentally.

## What's gated regardless of mode

A small number of operations are always gated through a deterministic
classifier rather than the current Runtime Mode. These include
operations that delete data, force-push, modify shared infrastructure,
or upload content to third-party web tools. The Runtime Mode is a
*posture*, not an override; the deterministic classifier on truly risky
operations always runs.

## Cancellation

Independently of the mode, **`Esc`** always interrupts the current
turn. It works:

- Pre-turn (between submit and the agent actually starting).
- Mid-turn (during streaming or tool execution, even when the working
  spinner is hidden).
- With a popup active (popup dismisses first; second `Esc` cancels the
  turn).

## Related concepts

- [Plan Mode](/docs/concepts/plan-mode) — runs the planning phase
  before any tool call would have to be approved.
- [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility) —
  the Bridge that receives the requests; budgets and approvals live
  here.

---

# Specialist Models

> Specialists are non-main models that a multi-model party exposes as tools to the main agent.

Permalink: https://docs.syntax-ftc.com/docs/concepts/specialist-models

A **Specialist** is any model in a multi-model party other than the
Main Agent and Default Sub-Agent. Specialists exist because real
coding work occasionally needs something the main model is bad at —
image understanding, OCR, segmentation, image generation, TTS, audio
transcription, time-series forecasting — and dragging the main model
through those tasks is wasteful.

## How specialists become tools

When a party deploys, every specialist is registered with the Main
Agent as a tool. The tool description comes from the catalog entry's
structured description, which tells the Main Agent what the
specialist is good at and when to call it.

The Main Agent then calls the specialist exactly the way it would
call any other tool. The Bridge intercepts the call, forwards it to
the specialist (with whatever payload is appropriate for the
specialist's modality), and folds the response back into the
conversation.

## Custom instructions per specialist

When you compose a party, every specialist gets an optional
instruction string. The instruction tells the Main Agent how to use
that specialist — for example, "use this specialist for OCR on
scanned PDFs, not for OCR on screenshots". Custom instructions are
the cheapest way to make a generic capability behave well in your
specific workflow.

## Cost & latency

Specialists are usually smaller and cheaper than the Main Agent.
Routing routine work to a specialist instead of the Main Agent
saves both tokens and time. The Party Builder's plan view shows
the expected cost shape so you can predict the savings before
deploying.

## Specialists that aren't LLMs

Many specialists are not LLMs at all — image generators,
segmenters, TTS, audio generators, mesh recovery, UI grounding,
time-series forecasters. Each surfaces as a purpose-specific tool
matched to its modality, so the Main Agent invokes the right tool
for the right kind of work.

## Where to go next

- [Concepts → Party Builder](/docs/concepts/party-builder)
- [Models → Purposes](/docs/models/purposes)
- [Differentiators → Multi-model parties](/docs/differentiators/multi-model-parties)

---

# Agent Handoff (vs compaction)

> Long sessions don't degrade — at the long-context threshold, Syntax does a structured handoff to a fresh context instead of in-place compaction.

Permalink: https://docs.syntax-ftc.com/docs/differentiators/agent-handoff

Almost every other agent stack handles "context window fills up" with
**compaction**: throwing away or summarizing the older parts of the
conversation. The result is the same in every case: the agent loses
the thread, forgets early decisions, and starts to drift.

Syntax does **Agent Handoff** instead. When the context approaches its
limit, the current agent writes a structured snapshot of the work so
far, persists it, and a fresh agent picks it up.

## Why this is structurally better

Compaction is lossy by construction:

- **Summarize**: the summary is wrong in subtle ways. The model is
  optimizing "shorter" without knowing what mattered.
- **Truncate**: the agent forgets exactly the thing it needs.
- **Both**: the new context is a mix of half-remembered earlier turns
  and full recent turns. The agent's behavior gets weird.

Handoff is an explicit checkpoint with a defined schema. The new
agent gets:

- The original goal.
- The decisions made so far.
- The files touched and the intent of each change.
- The open questions.
- The next step.

…and nothing else. No turn-by-turn churn. No half-remembered context.

## What you see

The user-visible effect: "the conversation feels like it just keeps
going indefinitely". The technical effect: "every turn always has a
clean context window".

For long-running agentic workflows, this is the difference between an
agent that finishes its task and an agent that spirals after turn 30.

## When to use compaction instead

Plain compaction is still available as a fallback if you want it for a
specific session. Handoff is the default at the long-context threshold,
but you can always force the simpler behavior.

## Where this lives

- [Concepts → Agent Handoff](/docs/concepts/agent-handoff) — the
  capability-level walkthrough.
- [Concepts → Plan Mode](/docs/concepts/plan-mode) — the front-end
  version of the same idea (separate planning context from execution
  context).

---

# AI-agent-friendly docs

> This wiki ships as a website AND as a Markdown corpus AI agents can ingest in a single fetch.

Permalink: https://docs.syntax-ftc.com/docs/differentiators/ai-agent-friendly

These docs are built to be useful to humans **and** to AI agents. Most
documentation sites assume only humans will read them. Syntax's docs
expose machine-readable surfaces alongside the rendered HTML so the
agents in your codebase can reason about Syntax's capabilities the same
way a developer can.

## What's exposed

| Surface | Path | Format |
|---|---|---|
| LLM index | [`/llms.txt`](/llms.txt) | Plain text (Markdown links) |
| Full corpus | [`/llms-full.txt`](/llms-full.txt) | Plain text (concatenated Markdown) |
| Per-page raw | `/api/mdx/<slug>` | text/markdown |
| JSON sitemap | [`/api/sitemap.json`](/api/sitemap.json) | application/json |
| XML sitemap | `/sitemap.xml` | XML |
| `robots.txt` | `/robots.txt` | text |

## How agents typically use it

- **One-shot ingestion**: fetch [`/llms-full.txt`](/llms-full.txt) once
  and the agent has the whole wiki in its context.
- **Targeted retrieval**: fetch [`/llms.txt`](/llms.txt) for a short
  index, pick the relevant page, then fetch its raw Markdown via
  `/api/mdx/<slug>`.
- **Programmatic discovery**: hit
  [`/api/sitemap.json`](/api/sitemap.json) to enumerate pages with
  metadata (title, description, tags, lastModified) and decide which to
  pull.

## Why it's a differentiator

For AI-coding teams evaluating Syntax, an agent in your codebase can
read these docs in one fetch and answer "should we use Syntax?" with
real grounded answers — not training-cutoff guesses. Once Syntax is
in use, the same surface lets your agents ask runtime questions ("how
do exposed endpoints work?", "which harnesses are supported?") and get
authoritative answers from a single canonical source.

## What's *not* exposed

These docs are public and capability-level. They describe **what**
Syntax does and **why** it's useful to you — not the internals of how
it does any of it. Internal protocols, source-level structure,
component boundaries, and implementation choices are deliberately not
in this corpus. If you need integration help that requires those
details, contact the Syntax team.

## Where to start

If you're an AI agent reading these docs, the recommended starting
points are:

- [`/llms.txt`](/llms.txt) — short index.
- [`/docs/introduction/what-is-syntax`](/docs/introduction/what-is-syntax)
  — the human-readable overview.
- [`/api/sitemap.json`](/api/sitemap.json) — programmatic page list.

---

# First-class inter-compatibility

> The harnesses you already use and the models you actually want to run — both sides keep their full capabilities, and nothing in the middle is degraded.

Permalink: https://docs.syntax-ftc.com/docs/differentiators/first-class-inter-compatibility

Syntax sits between two ecosystems: the coding harnesses you already
use and the wider catalog of models you might run behind them. Both
sides keep their full capabilities. Harnesses stay unmodified. Every
OSS model in the catalog keeps every feature its authors shipped it
with. Nothing in the middle is degraded.

## The harness you already use, unmodified

Every supported coding assistant — the Syntax CLI, Codex, Claude Code,
OpenCode, and Pi — works with Syntax without modification. None of
them are forked. None of them have a Syntax plugin. The integration
is as simple as it gets:

1. The harness's existing configuration points it at an LLM endpoint.
2. `syntax connect <agent>` edits that configuration to point at
   `localhost:<port>` (the Bridge) instead.
3. The harness sends OpenAI- or Anthropic-compatible requests to the
   Bridge.
4. The Bridge resolves the model, applies your policy, picks a
   backend, and streams the response back in the wire format the
   harness asked for.

The harness has no idea Syntax is in the middle.

## Reasoning, tool use, and modalities — on by default

For every OSS model in the catalog that supports reasoning, tool use,
or additional modalities, Syntax deploys those capabilities enabled
by default. You don't toggle on tool use; you don't enable the
reasoning channel separately; you don't wire up a different endpoint
for vision or audio. The deployment exposes the model's full declared
feature surface from the first request.

This support is not narrow. It spans the catalog: across LLMs, MoE
models, vision-language models, audio models, embedding and reranking
models, and multimodal generation models, Syntax includes the
engine-specific work that makes each model's official tool-call
parser, reasoning channel, and modality inputs flow correctly through
the OpenAI- and Anthropic-compatible surfaces on the Bridge.

The practical consequence: an OSS model dropped into a deployment
behaves like a frontier hosted model from the harness's perspective.
Tool calls round-trip with full fidelity. Reasoning content arrives
in the channel the harness expects. Image, audio, and other
modalities work without a separate code path. Consuming an OSS
deployment is not a downgraded version of consuming a hosted-provider
deployment.

## Why it matters

- **Zero learning curve.** You keep the keyboard shortcuts, the
  configuration files, the workflow you're used to.
- **No harness lock-in.** If you switch from Codex to Claude Code
  tomorrow, your Syntax config doesn't change at all.
- **No model-feature lock-in.** Reasoning, tool use, and modalities
  on OSS models aren't gated behind hosted-provider APIs — what the
  model's authors shipped is what you get through Syntax.
- **Multiple harnesses simultaneously.** Connect all of them at once.
  They share the same Bridge, the same active model policy, the same
  budgets.
- **Reversible.** `syntax disconnect <agent>` puts the harness's
  config back exactly the way it was. The change is recorded so it
  can always be undone.

## What this isn't

This isn't a "compatibility shim". The Bridge is a real
implementation of the OpenAI- and Anthropic-compatible APIs, with
full streaming, tool-call, and reasoning support. Anything you can do
with those APIs, you can do through Syntax — just with the option to
redirect the request anywhere.

## Compared to alternatives

| Approach | Harness lock-in | OSS model-feature parity | Setup |
|---|---|---|---|
| Proprietary IDE hard-coded to one model | High | N/A — vendor's model only | Trivial |
| Manually wire each tool's config per provider | Medium | Provider-dependent | Per-tool |
| Custom proxy you wrote yourself | Low | You build it | High |
| **Syntax** | None | First-class across the catalog | One install + `syntax connect` |

## Where to start

- [Connecting a harness](/docs/getting-started/connecting-a-harness)
- [Harnesses overview](/docs/harnesses/overview)

---

# Multi-engine inference

> Hardware-aware engine selection across a large compatibility matrix — Syntax owns the optimization work so you don't.

Permalink: https://docs.syntax-ftc.com/docs/differentiators/multi-engine-inference

Choosing how to run a model is a real engineering problem. The "right"
serving stack for a given workload depends on the model architecture,
the hardware family and SKU, the attention backend, the quantization
format, the tool-call and reasoning parsers each engine ships, how
each engine handles KV cache offload, and the way the model needs to be
sharded across one or more hosts. Syntax owns this entire decision so
the surface you build against stays a single, stable endpoint.

## The matrix Syntax is solving for you

When you deploy a model, the autotuner is searching across — at
minimum — the cross product of:

- **Model architecture and modality.** Dense and Mixture-of-Experts
  LLMs, vision-language models, diffusion image and video generators,
  audio models, embedding models, rerankers, OCR, segmentation,
  time-series forecasters, UI-grounding models, 3D mesh-recovery
  models. Each has different serving constraints.
- **Hardware.** Dozens of GPU SKUs across NVIDIA, AMD ROCm, and Apple
  Silicon; CPU-only fallback; single-host versus multi-host
  topologies; and the corresponding cloud instance types when running
  on managed remote.
- **Serving engines.** Multiple engines per model family — vLLM,
  SGLang, TensorRT-LLM, llama.cpp, MLX, diffusion-native servers, and
  others — each with its own performance profile and its own feature
  support per model.
- **Engine-internal configuration.** Attention backends (FlashAttention,
  PagedAttention, architecture-specific custom kernels), KV cache
  layout and hierarchical offload to host RAM, speculative decoding,
  prefix caching, quantization (W4A16, W8A8, FP8, GPTQ, AWQ, GGUF),
  tensor and pipeline parallelism, batch-scheduling policies.

That's not a configuration; it's a search space. Picking the wrong
cell costs you tokens-per-second, time-to-first-token, output
correctness, or money — sometimes all four.

## "Supported" isn't the same as "best supported"

A given model is frequently supported by more than one engine, but
the quality of that support is rarely identical. Some of the
distinctions the autotuner makes:

- A model runs on two engines, but only one ships the official
  tool-call parser. Tool calls degrade on the other. Syntax routes to
  the engine with first-class parser support.
- A model exposes a reasoning channel on both engines, but only one
  surfaces it cleanly through the OpenAI- and Anthropic-compatible
  Bridge. Syntax picks the engine that preserves the reasoning
  round-trip.
- A long-context workload fits in VRAM on one engine but requires
  hierarchical KV cache offload to host RAM on the other. If the
  deployment is latency-sensitive, the in-VRAM engine wins; if it's
  throughput- and context-heavy, the offload-capable engine wins.
- A quantized variant of a model is fast and produces faithful
  outputs on one engine but is numerically unstable on another at the
  same precision. Syntax avoids the unstable combination.

This is the kind of nuance that's otherwise buried in engine release
notes, GitHub issues, and benchmarks you'd have to run yourself.

## What you actually decide

The user-facing input is two values, not the matrix above:

- **A deployment tier.** Either *Performance* — low
  time-to-first-token and high tokens-per-user-per-second, willing to
  pay for the right hardware and serving topology — or
  *Cost-optimized* — aggressively minimize spend while meeting your
  acceptable floors for TTFT and per-user throughput.
- **A target.** Local, self-managed remote, managed remote on dUX, or
  a hosted-provider passthrough.

Everything underneath — engine selection, attention backend,
quantization, parallelism, KV offload, and instance-type selection on
managed remote — is the autotuner's job.

## Party-level planning

A multi-model party (Main Agent, Default Sub-Agent, up to six
Specialists) is a packing and isolation problem on top of single-model
optimization. The autotuner plans across the whole party:

- **What packs together.** Models with complementary memory profiles
  and compatible engines that can share a host without contention get
  co-tenanted to reduce cost.
- **What stays separate.** Models that would harm each other's
  latency under load — for example, a latency-sensitive Main Agent
  next to a throughput-heavy diffusion specialist — get split across
  instances.
- **Role-aware degradation.** Under VRAM pressure, specialists yield
  first, the sub-agent second, the Main Agent only as a last resort.
  Eligible smaller models can fall back to a CPU engine automatically.
- **Tier propagation.** Performance versus Cost-optimized applies to
  the party as a whole and shapes both the packing decisions and the
  instance-type recommendations on managed remote.

## Scales from zero to whatever sustained traffic demands

Every plan the autotuner produces is autoscalable end-to-end. Under
no traffic, a deployment can sit at zero replicas; under sustained
load it scales out across replicas of the same plan, fronted by the
Bridge so the harness sees a single endpoint either way; when load
falls off, replicas wind down. You don't pick a horizontal-pod-
autoscaler policy, you don't model cold-start curves, and you don't
maintain a separate scaling configuration per model — the plan
already encodes how to scale itself.

## What stays the same

From the harness's point of view, none of this is visible. You get
the same OpenAI- or Anthropic-compatible API surface. The model
appears in the harness's model list. Streaming, tool calls, and
reasoning content flow through unchanged. Swapping engines, scaling
out, or re-packing a party doesn't require any change in the harness.

## Where to start

- [Inference → Overview](/docs/inference/overview)
- [Inference → Hardware support](/docs/inference/hardware-support)
- [Concepts → Party Builder](/docs/concepts/party-builder)
- [Differentiators → Multi-model parties](/docs/differentiators/multi-model-parties)

---

# Multi-model parties

> One main agent, one sub-agent, up to six specialists — composed into a single deployment with capability scoring and a unified plan.

Permalink: https://docs.syntax-ftc.com/docs/differentiators/multi-model-parties

Most agent stacks pretend a single LLM is enough. A real coding
workflow needs a strong main model, a cheap sub-agent for routine work,
and the option to invoke specialists when the task calls for it.
Syntax's **Party Builder** is the answer.

## What a party gives you

A party is a single deployment that exposes:

- A **Main Agent** — the model your harness primarily talks to.
- A **Default Sub-Agent** — the cheaper model the main agent delegates
  routine tasks to.
- Up to **six Specialists** — each a distinct model with a specific
  capability (e.g., image understanding, OCR, image generation,
  segmentation, TTS, time-series forecasting, etc.).

Specialists are exposed to the main agent as tools. The main agent
decides when to invoke them, just like any other tool call. The
response is folded back into the conversation transparently.

## Why this beats one big model

- **Cost.** A strong main model is expensive per token. A cheap
  sub-agent that handles 80% of routine work cuts the bill
  dramatically without losing capability on the hard 20%.
- **Latency.** Smaller specialists answer faster than asking the main
  model to do everything.
- **Specialization.** Some tasks (image segmentation, OCR, TTS) are
  not LLM tasks at all. Specialists let you reach the right tool for
  each job.
- **Visibility.** The party UI shows which model is carrying which
  capability and where there are coverage gaps before you deploy.

## Capability scoring & plan generation

When you compose a party, the Party Builder shows:

- Which capabilities the chosen models cover and where there are gaps.
- A per-model strength bar so you can see who's doing what.
- A predicted hardware footprint — how the party will fit on your
  local GPU, on a self-hosted box, or on managed remote.

You see all of that *before* you deploy.

## Presets

If composing a party from scratch is more work than you want, the
**Presets** tab gives you ready-to-deploy party definitions — schema-
versioned templates for common workflows that you pick and deploy
directly. Presets are also a clean way to share standard party
configurations across a team or organization.

## Where it deploys

A party deploys to any of the same targets as a single model:

- **Local** — multiple models on your own machine (subject to fit).
- **Self-Managed Remote** — your own SSH-reachable GPU box(es).
- **Managed Remote** — dUX handles placement.

The deployment surface is the same in each case; only the underlying
hardware changes.

## Where to start

- [Concepts → Party Builder](/docs/concepts/party-builder)
- [Inference → Overview](/docs/inference/overview)
- [Differentiators → Multi-engine inference](/docs/differentiators/multi-engine-inference)

---

# Managed remote vs self-managed remote

> When to pick which path — capability, control, and cost tradeoffs.

Permalink: https://docs.syntax-ftc.com/docs/dux-integration/differences-vs-self-managed

Syntax supports two ways to run models on remote hardware:

- **Self-managed remote** — your own SSH-reachable boxes, you own
  the hardware and the OS, Syntax handles the engine and lifecycle.
- **Managed remote (dUX)** — dUX manages the hardware on your
  behalf, in your own cloud accounts. You describe the deployment
  intent; dUX handles provisioning, placement, and scaling. You
  remain the sole admin and can take over directly any time you
  need to.

Each has the right place; this page lays out the tradeoffs.

## Side-by-side

| | Self-managed remote | Managed remote (dUX) |
|---|---|---|
| Hardware admin | You | You — dUX orchestrates, you retain full admin |
| Cloud account | N/A | Yours; dUX operates within it |
| GPU drivers | You install once | dUX installs and updates |
| Autoscaling | None (single host or multi-host you control) | dUX, automatic |
| Replica management | Manual | Automatic |
| Setup time | Provisioning your own host | Minutes — pick a tier |
| Network | Whatever your host has | Cloud-grade ingress |
| Predictable cost | You know your bill (it's your hardware) | Hourly on the cloud you've authorized |
| Privacy | Highest — you own the box | High — your cloud account, dUX-orchestrated, isolated per org |
| Best for | Power users with hardware; teams with strict data residency | Teams that want managed cloud GPU without giving up admin |

## When self-managed remote is the right answer

- You already own GPU hardware that's underutilized.
- You want maximum control of the OS, drivers, and network.
- Data residency is a hard requirement and you can't put weights
  through dUX.
- You want SSH-level visibility into running processes for
  debugging.

## When managed remote is the right answer

- You don't have GPU hardware.
- Your team needs autoscaling because traffic is bursty.
- You want a deployment that's always available without you
  babysitting it.
- You want sharing with teammates to be one click rather than
  manual SSH access.

## Mixing both

Nothing prevents you from using both. A common pattern:

- A handful of "always available" managed remote deployments behind
  the org's Bridge.
- One or more self-hosted remote boxes for experiments, larger
  models, or workloads with strict data residency.

The Bridge routes per-request based on your model policy, so your
harness sees one consistent set of model names regardless of where
each one runs.

## Where to go next

- [Inference → Managed remote](/docs/inference/managed-remote)
- [Inference → Remote self-hosted](/docs/inference/remote-self-hosted)
- [Syntax × dUX → Overview](/docs/dux-integration/overview)

---

# Managed remote on dUX

> The developer-facing flow — pick a model, pick a tier, deploy. dUX handles the rest.

Permalink: https://docs.syntax-ftc.com/docs/dux-integration/managed-remote

The managed remote flow is the most common way Syntax and dUX
collaborate. From the developer's perspective, it's three or four
clicks; behind the scenes, two systems are exchanging structured
intents and placement responses.

## The user-facing flow

1. Open **Deployments → New Deployment**.
2. Pick a category (Chat, General, Coding, Media, Vision, Custom) or
   open the Party Builder for multi-model.
3. Pick **Managed Remote** as the target.
4. Pick a tier: **Latency** or **Throughput**.
5. Set exposure: private endpoint, public endpoint with bearer, or
   both.
6. Submit.

Syntax submits the intent to dUX. The desktop app shows the
deployment moving through statuses — accepted, provisioning, ready —
and surfaces any issues as clear messages rather than dUX-internal
errors.

## What you see when it's ready

When dUX returns a "ready" status, Syntax wires the resulting
endpoint into the Bridge. Practically, that means:

- The model appears in the harness's model list exactly like a local
  or self-hosted-remote model.
- Your harness routes to it transparently — no harness-side
  reconfiguration.
- Multi-model parties show every model in the party as part of a
  single deployment in the **Active Deployments** view; the Main
  Agent's tool list automatically includes its specialists.

## Saved remote targets

After your first managed-remote deployment, you can save the target
configuration — name, tier, exposure, replica policy. Subsequent
deployments to the same logical target are one click and inherit the
saved settings.

This is especially useful for teams that want a consistent set of
deployments across members: save one set of targets at the org level
and members can deploy to them without re-picking each setting.

## Lifecycle

Once a deployment is ready, it stays running until you stop it. From
the desktop app you can:

- **Scale** — increase or decrease replica count (within the tier's
  bounds).
- **Stop** — bring the deployment down. dUX releases the GPU
  resources.
- **Replace** — replace the deployment with a different model
  (atomic where possible).
- **Upgrade** — when the underlying base or engine images change,
  Syntax surfaces an "upgrade available" prompt; accepting issues a
  fresh deployment with the new images.

## Multi-model parties on managed remote

When you deploy a party — Main Agent + Default Sub-Agent + up to
six Specialists — the entire party deploys as a coherent unit on
dUX. Specialists become tool calls available to the Main Agent
exactly the way they do for local or self-hosted-remote parties.

## Where to go next

- [Inference → Managed remote](/docs/inference/managed-remote) — the
  inference-plane view of managed remote.
- [Concepts → Party Builder](/docs/concepts/party-builder)

---

# Syntax × dUX

> How Syntax integrates with dUX for managed cloud GPU — what each side owns, what each side contributes.

Permalink: https://docs.syntax-ftc.com/docs/dux-integration/overview

Syntax integrates with **dUX** to provide managed remote inference
without you having to manage cloud GPU infrastructure. Managed remote
is available to all users; you can also keep using local and
self-hosted remote serving without involving dUX at all.

## What Syntax brings to dUX

When you choose a managed remote deployment, Syntax sends dUX a
**deployment intent**: which model (or party), which target tier
(Latency or Throughput), how many replicas, what exposure (private vs
public), what isolation level. The intent is logical — "I want this
model deployed at this tier" — not a description of how to deploy it.

Syntax also brings:

- **Catalog metadata.** The right serving images, the right
  parameters, the right hardware requirements per model.
- **Party-level planning.** When the deployment is a multi-model party,
  Syntax composes the placement intent so dUX sees a coherent
  multi-model deployment rather than N independent requests.
- **Authentication.** Bearer-token-based auth for the deployed
  endpoint, including the per-deployment exposed-bearer flow.

## What dUX brings to Syntax

dUX is the orchestrator that provisions and manages the hardware your
workloads run on — inside your own cloud accounts, with you as the
sole admin:

- GPU placement and scheduling.
- Driver compatibility and provisioning.
- Autoscaling.
- Multi-replica weight distribution.
- Ingress and load balancing.
- Per-organization isolation.
- Lifecycle (start, scale, stop, replace, upgrade).

dUX returns concrete endpoint URLs and status updates back to Syntax
as the deployment progresses. The underlying machines remain under
your administrative control: dUX operates them on your behalf and you
can step in directly whenever you need to.

## What stays with Syntax

A few things stay on the Syntax side regardless of where the model
ends up running:

- **Bridge.** Your harness still talks to the local Bridge. The
  Bridge routes requests to the dUX-managed endpoint behind the
  scenes.
- **Tools and skills.** The agent's tool list and skills framework
  live in Syntax — dUX is purely about serving the underlying model.
- **Sessions and memory.** Your session history, your memory, and
  your Plan-Mode plans are all client-side or in your home
  directory. dUX never sees them.
- **Budgets and approvals.** Token and compute budgets are computed
  in Syntax against the active org policy.

This separation of concerns is intentional: dUX orchestrates the
cloud GPU hardware on your behalf — in your cloud accounts, with you
as the sole admin — and Syntax is your control surface and your
developer experience.

## What dUX never sees

Putting it the other way: dUX never sees your session content, your
prompts, your code, your tool calls, or your harness. It sees model
weights to load, deployments to scale, and endpoints to serve — and
nothing else.

## Where to go next

- [Managed remote (dUX)](/docs/dux-integration/managed-remote) — the
  developer-facing flow.
- [Permissions and IAM](/docs/dux-integration/permissions-and-iam) —
  identity boundaries between the two systems.
- [Differences vs self-managed](/docs/dux-integration/differences-vs-self-managed)
  — when to pick which path.

---

# Permissions & IAM

> Identity boundaries between Syntax and dUX, and what each system enforces.

Permalink: https://docs.syntax-ftc.com/docs/dux-integration/permissions-and-iam

Syntax and dUX are two distinct systems with two distinct identity
boundaries. Understanding which side enforces what is the key to
predicting behavior and to debugging permission errors when they
happen.

## The two identity boundaries

| Boundary | What it protects | Owned by |
|---|---|---|
| **Syntax identity** | Who you are within Syntax (user, role, org membership). | Syntax (Control Plane). |
| **dUX identity** | What infrastructure permissions Syntax has within your dUX organization. | dUX. |

A request from your harness to a managed-remote endpoint crosses
both: it's authenticated as you in Syntax, then translated into a
deployment-scoped operation that dUX accepts because Syntax has the
right dUX-side permissions to drive it.

## What Syntax enforces

- **Authentication.** The bearer token your harness sends to the
  Bridge.
- **Authorization within Syntax.** Whether you're allowed to deploy,
  scale, expose, or revoke under your role.
- **Org policy.** Whether the model you're trying to deploy is in
  the org's allowed catalog.
- **Budgets.** Whether the deployment fits your token / compute
  budget under the active policy.
- **Audit.** The audit log entry for the operation.

## What dUX enforces

- **Infrastructure permissions.** Whether your Syntax org has the
  right to provision GPUs, set up ingress, etc. in your dUX
  organization.
- **Resource quotas.** Whether the requested deployment fits within
  your dUX-side quotas.
- **Per-org isolation.** Whether the deployment can be placed within
  your dedicated isolation boundary.

## What this means in practice

- **Permission errors from Syntax** look like role / policy / budget
  failures. They have clear messages and tell you what to ask your
  org admin to change.
- **Permission errors from dUX** look like quota / capacity / IAM
  failures at the infrastructure layer. Syntax surfaces them with the
  dUX-side detail so the right team (your dUX admin or whoever owns
  the dUX org-level permissions) can resolve them.

## Settings that matter

In **Settings → Managed Remote**, you can review:

- The dUX organization Syntax is currently configured to talk to.
- The effective dUX permissions Syntax has been granted.
- A simple quick-check that exercises the connection without
  deploying anything.

If any of those look wrong, work with your dUX administrator to
adjust the permissions on their side.

## Where to go next

- [Differences vs self-managed](/docs/dux-integration/differences-vs-self-managed)

---

# Connecting a coding assistant

> How `syntax connect` wires your existing harness to Syntax, and how to disconnect cleanly.

Permalink: https://docs.syntax-ftc.com/docs/getting-started/connecting-a-harness

Syntax integrates with five coding assistants out of the box: the
**Syntax CLI**, **Codex**, **Claude Code**, **OpenCode**, and **Pi**.

The Syntax CLI ships with Syntax and is available the moment you install
it — there's nothing to connect. Every other harness is wired up with a
single `syntax connect <name>` command, and `syntax disconnect <name>`
puts it back exactly the way it was.

## What "connecting" means

When you connect a harness, Syntax edits the harness's own configuration
file to point at the local Bridge. From the harness's point of view, it's
talking to a normal OpenAI- or Anthropic-compatible API; from your point of
view, it's now using whatever model and policy you've configured in Syntax.

The edit is recorded in a per-harness ledger, so `syntax disconnect <agent>`
restores the original configuration byte-for-byte.

## Connect from the CLI

```bash
syntax connect codex
syntax connect claude-code
syntax connect opencode
syntax connect pi
```

Each command:

1. Detects whether the tool is installed.
2. Locates its configuration file in the standard location for your OS.
3. Backs up the current configuration.
4. Edits the configuration to point at the local Bridge.
5. Records the change so it can be reverted.

If the tool isn't installed, the command prints the official install
instructions and exits without making any changes.

Connecting and disconnecting is a CLI-only flow — there is no Harnesses
page in the desktop app. The Syntax CLI is the only harness that needs
no `connect` step, because it's bundled with Syntax.

## Disconnect

```bash
syntax disconnect codex
```

Restores the harness's original configuration. If the tool has since been
deleted from your machine, `disconnect` gracefully cleans up the
Syntax-side ledger without erroring.

## Multiple harnesses at once

You can have any number of harnesses connected simultaneously. Each talks
to Syntax independently and gets the same active model policy. This is the
primary path to "use Syntax with everything" — connect Codex for terminal
work, Claude Code for chat-style coding, OpenCode for editor work, all at
once.

## Per-harness notes

| Harness | Notes |
|---|---|
| **Syntax CLI** | The default agent that ships with Syntax. Always available. |
| **Codex** | Connects to the Bridge through its standard configuration. Tool calls and reasoning flow correctly. |
| **Claude Code** | Uses the Anthropic-compatible Bridge route. Tool calls and reasoning flow correctly. |
| **OpenCode** | JSON-configured. Straightforward connect/disconnect. |
| **Pi** | Connects to the Bridge through its standard configuration. |

For deeper per-harness behavior, see
[Harnesses → Overview](/docs/harnesses/overview).

---

# First launch

> What happens the first time you open Syntax and how to land on a working setup in under five minutes.

Permalink: https://docs.syntax-ftc.com/docs/getting-started/first-launch

The first time you open Syntax — either the desktop app or the CLI — it
runs a short, mostly automatic setup. This page describes what you'll see
and the choices you can make.

## Step 1 — Sign in

You can use Syntax without an account, but signing in lets you sync
settings between machines and (for organizations) wires up team-level
configuration. Sign in with Google, GitHub, or your organization's SSO
provider.

If you're an organization administrator setting up Syntax for the first
time, you'll be prompted to wire up OIDC or SAML at this step.

## Step 2 — Hardware detection

Syntax probes your machine for:

- CPU type and core count
- Available RAM
- GPU(s) — NVIDIA / AMD / Apple Silicon / none
- Disk space available for model weights
- Docker (optional)

You can review the detected configuration on the **Settings → Hardware**
page at any time.

## Step 3 — Pick a starting model (optional)

Syntax doesn't force you to download a model on first launch. If you want a
local model immediately, the **Catalog** page shows recommended starter
models for your hardware tier — small, fast, capable models that will fit
comfortably on your machine. One click downloads weights and registers the
model.

If you'd rather start with a hosted provider (OpenAI, Anthropic, etc.),
skip this step and add your provider key on **Settings → Providers**.

## Step 4 — Connect your first harness

Open the **Harnesses** page or run `syntax connect <agent>` from the CLI.
You'll see the supported coding assistants. Pick the one you already
use; Syntax detects whether it's installed, edits its configuration to
point at the local Bridge, and records the change so it can be reverted at
any time.

The full walk-through is in
[Connecting a harness](/docs/getting-started/connecting-a-harness).

## Step 5 — Start a session

That's it. Open your harness as you normally would. It will now talk to
Syntax instead of going directly to a provider. The first request you send
exercises the full pipeline (model resolution → routing → serving → stream
back) and any issue surfaces on the **Sessions** page with a clear
diagnostic.

## What you can change later

Everything in first-launch is reversible from **Settings**:

- Hardware preferences (which GPU to use, how much VRAM to reserve, etc.)
- Default model and aliases
- Provider keys
- Connected harnesses (each can be disconnected one at a time)

Settings live in your home directory and are user-readable so you can see
exactly what's stored.

---

# Install on Linux

> Install Syntax on Ubuntu 20.04+, Debian 11+, Fedora 38+, or any modern x86_64 / aarch64 distribution.

Permalink: https://docs.syntax-ftc.com/docs/getting-started/install-linux

Syntax runs on x86_64 and aarch64 Linux. It is tested on Ubuntu 20.04+,
Debian 11+, and Fedora 38+, and works on any glibc-based distribution from
roughly the same vintage. Wayland and X11 are both supported for the desktop
app; the CLI works in any TTY.

## System requirements

| Component | Minimum | Recommended |
|---|---|---|
| Distribution | glibc-based, Ubuntu 20.04+ / Debian 11+ / Fedora 38+ equivalent | Latest LTS |
| Architecture | x86_64 or aarch64 | x86_64 with NVIDIA GPU |
| RAM | 16 GB | 32 GB+ for larger models |
| Disk | 20 GB free | 100 GB+ for local model weights |
| Optional GPU | — | NVIDIA (CUDA), AMD (ROCm) |

For local GPU inference, an NVIDIA GPU with recent drivers is the smoothest
path. AMD ROCm is supported for compatible cards. CPU-only serving works for
smaller models.

## Install

```bash
curl -fsSL https://www.syntax-ftc.com/install.sh | bash
```

The installer drops binaries into a per-user location, registers the desktop
entry, and sets up the first-run configuration. Run it again to upgrade.

## GPU prerequisites

If you intend to run open-weight models on a local GPU:

- **NVIDIA**: install the proprietary NVIDIA driver (≥ 545) and ensure
  `nvidia-smi` works.
- **AMD ROCm**: install the ROCm runtime that matches your card and
  distribution. Syntax detects ROCm at first launch and falls back to CPU if
  it isn't available.

Docker is **optional** on Linux but recommended if you plan to run GPU
serving engines that ship as containers. The installer can guide you through
enabling Docker if it isn't present.

## Verify

```bash
syntax --version
syntax doctor
```

`syntax doctor` runs a self-check that reports your detected hardware, the
inference engines that will be available, and any missing dependencies.

## What's next

- [First launch](/docs/getting-started/first-launch)
- [Connecting a harness](/docs/getting-started/connecting-a-harness)
- [Inference → Hardware support](/docs/inference/hardware-support)

---

# Install on macOS

> Install Syntax on macOS 12 (Monterey) or later.

Permalink: https://docs.syntax-ftc.com/docs/getting-started/install-macos

Syntax runs on macOS 12 (Monterey) and later, on both Apple Silicon (M1/M2/M3
and later) and Intel Macs. Apple Silicon is the recommended path: the unified
memory architecture works very well for local inference, and Syntax uses the
native Apple Metal stack to run open-weight models efficiently without any
extra setup.

## System requirements

| Component | Minimum | Recommended |
|---|---|---|
| macOS | 12 (Monterey) | 14 (Sonoma) or later |
| CPU | Apple Silicon or 64-bit Intel | Apple Silicon (M2 Pro / M3 / M4) |
| RAM | 16 GB | 32 GB+ for larger models |
| Disk | 20 GB free | 100 GB+ if you plan to keep multiple model weights locally |

A discrete GPU is **not** required on Apple Silicon. Syntax will use the
unified memory and the Apple-native engine for eligible models.

## Install

The recommended install path is a single command:

```bash
curl -fsSL https://www.syntax-ftc.com/install.sh | bash
```

This downloads the installer, places the Syntax application bundle and CLI
in standard system locations, and sets up the first-run configuration.

The installer is idempotent — running it again on the same machine simply
verifies the install or upgrades to the latest version.

## Verify

After install, open a new terminal and run:

```bash
syntax --version
syntax doctor
```

`syntax doctor` checks for the GPU/CPU it can use, the disk space available
for models, and whether your network can reach the catalog. Any warnings it
prints come with a one-line fix.

## Pick a coding assistant

Syntax does not ship with its own editor. To start a real session, install
one of the supported coding assistants and connect it to Syntax. The
[Connecting a harness](/docs/getting-started/connecting-a-harness) guide walks
through the supported tools and how `syntax connect` wires them up.

## What's next

- [First launch](/docs/getting-started/first-launch) — the desktop app's
  initial setup flow.
- [Connecting a harness](/docs/getting-started/connecting-a-harness) — make
  Codex, Claude Code, OpenCode, and Pi talk to Syntax.

---

# Install on Windows

> Install Syntax on Windows 10 (21H2) or later, with full native and WSL2 paths.

Permalink: https://docs.syntax-ftc.com/docs/getting-started/install-windows

Syntax runs natively on Windows 10 (21H2) and later, and on Windows 11. On
machines with an NVIDIA GPU it can run open-weight models locally; on
machines without a GPU it can still serve smaller models on CPU and route
larger requests to remote backends.

The Syntax desktop app and CLI are shipped as native Windows binaries — no
WSL or Docker is required for the basic experience. WSL2 is supported for
power users who prefer a Linux toolchain.

## System requirements

| Component | Minimum | Recommended |
|---|---|---|
| OS | Windows 10 (21H2) | Windows 11 |
| Architecture | x64 or ARM64 | x64 with NVIDIA GPU |
| RAM | 16 GB | 32 GB+ for larger models |
| Disk | 20 GB free | 100 GB+ for local model weights |
| Optional GPU | — | NVIDIA (CUDA) |

## Install (native)

Open PowerShell and run:

```powershell
iwr -useb https://www.syntax-ftc.com/install.ps1 | iex
```

The installer registers Syntax under your user profile, adds the CLI to
your `PATH`, and creates Start Menu entries. Re-run to upgrade.

## Install (WSL2)

If you prefer a Linux toolchain, install Syntax inside your WSL2
distribution exactly as you would on Linux:

```bash
curl -fsSL https://www.syntax-ftc.com/install.sh | bash
```

You can run the desktop app on Windows and the CLI inside WSL pointing at
the same control plane — both reach the local Bridge over `localhost`.

## GPU prerequisites

If you have an NVIDIA GPU and want local inference:

- Install the latest NVIDIA Studio or Game Ready driver.
- Verify `nvidia-smi` works in PowerShell.
- For WSL2 GPU serving, follow NVIDIA's CUDA-on-WSL guide; the Linux
  installer will detect it at first launch.

## Verify

```powershell
syntax --version
syntax doctor
```

`syntax doctor` reports the detected hardware, the inference engines
available on this machine, and any missing dependencies.

## What's next

- [First launch](/docs/getting-started/first-launch)
- [Connecting a harness](/docs/getting-started/connecting-a-harness)
- [Inference → Hardware support](/docs/inference/hardware-support)

---

# Claude Code

> Use Anthropic's Claude Code CLI with Syntax.

Permalink: https://docs.syntax-ftc.com/docs/harnesses/claude-code

[Claude Code](https://www.anthropic.com/claude-code) is Anthropic's
official CLI for Claude. Syntax connects to it through the
Anthropic-compatible surface on the Bridge, so Claude Code works
unmodified.

## Connect

```bash
syntax connect claude-code
```

The connect flow points Claude Code's configured Anthropic endpoint
at the local Bridge. Because Claude Code natively speaks the
Anthropic-compatible wire format, no shim is involved — the Bridge
accepts those requests and routes them directly.

## What works through Syntax

- Anthropic-style streaming, tool calls, and reasoning all pass through.
- Any model Syntax exposes — Anthropic-hosted (Claude family), other
  hosted providers, local open-weight models, managed remote — is
  reachable from Claude Code through the same endpoint.
- Tool definitions, prompt-caching headers, and content blocks all
  preserve their semantics when routed.

## Why this is interesting

Claude Code is built around an Anthropic-style API. Syntax exposes
the same shape on a localhost endpoint, which means Claude Code can:

- Run a local open-weight model like a Claude clone.
- Route to a non-Anthropic hosted provider while still using Claude Code
  as the UI.
- Be combined with other harnesses on the same machine, all sharing the
  same active model policy through Syntax.

## Disconnect

```bash
syntax disconnect claude-code
```

Restores Claude Code's original Anthropic endpoint configuration.

## Where to start

- [Connecting a harness](/docs/getting-started/connecting-a-harness)
- [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility)
  — how the Bridge exposes its Anthropic-compatible API surface.

---

# Codex

> Use OpenAI's Codex CLI with Syntax.

Permalink: https://docs.syntax-ftc.com/docs/harnesses/codex

[Codex](https://openai.com/codex) is OpenAI's official coding CLI. Syntax
connects to it via the OpenAI-compatible Bridge, so Codex works unmodified.

## Connect

```bash
syntax connect codex
```

The connect flow:

- Locates Codex's configuration in the standard location for your OS.
- Rewrites the model endpoint to point at the local Bridge.
- Backs up the original configuration so `syntax disconnect` can restore it.

If Codex isn't installed, the command prints Codex's official install
instructions and exits without making changes.

## What works through Syntax

- OpenAI-style streaming, tool calls, and reasoning all pass through.
- Any model Syntax exposes — OpenAI-hosted, other hosted providers, local
  open-weight models, or managed remote inference via dUX — is reachable
  from Codex through the same endpoint.
- If you sign in with an OpenAI Plus/Pro subscription, Syntax can route
  Codex requests to OpenAI's models alongside your OSS stack so simple,
  everyday, and complex tasks each land on the right backend.

## Disconnect

```bash
syntax disconnect codex
```

Restores Codex's original endpoint configuration.

## Where to start

- [Connecting a harness](/docs/getting-started/connecting-a-harness)
- [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility)
  — how the Bridge exposes its OpenAI-compatible API surface.

---

# OpenCode

> Use OpenCode with Syntax.

Permalink: https://docs.syntax-ftc.com/docs/harnesses/opencode

[OpenCode](https://opencode.ai) is an open-source coding agent. Syntax
connects to it via the OpenAI-compatible Bridge.

## Connect

```bash
syntax connect opencode
```

The connect flow:

- Locates OpenCode's JSON configuration in the standard location for
  your OS.
- Rewrites the model endpoint to point at the local Bridge.
- Backs up the original configuration.

If OpenCode isn't installed, the command prints OpenCode's official
install instructions and exits.

## What works through Syntax

- Streaming, tool calls, and reasoning all pass through.
- Any model Syntax exposes is usable from OpenCode.
- OpenCode's session controls work unchanged.

## Disconnect

```bash
syntax disconnect opencode
```

Restores OpenCode's original JSON configuration.

## Where to start

- [Connecting a harness](/docs/getting-started/connecting-a-harness)
- [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility)

---

# Harnesses overview

> The coding assistants Syntax integrates with out of the box, and how the connect/disconnect lifecycle works.

Permalink: https://docs.syntax-ftc.com/docs/harnesses/overview

A **harness** is the coding assistant you use day-to-day. Syntax ships with
its own Codex-based `syntax-cli` and integrates with four third-party
harnesses out of the box. Each one keeps using its own UX, its own
configuration file, and its own personality — Syntax just sits behind the
localhost endpoint they already talk to.

| Harness | Type | Streaming | Tool calls | Reasoning | Anthropic-compatible |
|---|---|---|---|---|---|
| [Syntax CLI](/docs/harnesses/syntax-cli) | Terminal (Syntax-native) | ✓ | ✓ | ✓ | ✓ |
| [Codex](/docs/harnesses/codex) | Terminal | ✓ | ✓ | ✓ | ✓ |
| [Claude Code](/docs/harnesses/claude-code) | Terminal / IDE | ✓ | ✓ | ✓ | ✓ (native) |
| [OpenCode](/docs/harnesses/opencode) | Terminal | ✓ | ✓ | ✓ | ✓ |
| [Pi](/docs/harnesses/pi) | Terminal | ✓ | ✓ | ✓ | ✓ |

## How the connect lifecycle works

Connecting any harness to Syntax follows the same lifecycle:

1. **Detect.** Syntax checks whether the harness is installed in any of
   the standard locations for your OS.
2. **Locate config.** It locates the harness's configuration file.
3. **Backup.** It records the current configuration so it can be restored
   later.
4. **Edit.** It rewrites the configuration to point at the local Bridge,
   and applies any harness-specific tweaks.
5. **Record.** It writes a small ledger entry under your home directory
   so the change can always be undone.

`syntax disconnect <agent>` walks the ledger entry in reverse: restores
the original config, removes the ledger row, and the harness is back
exactly the way it was.

## Connect from the CLI

```bash
syntax connect codex
syntax connect claude-code
syntax connect opencode
syntax connect pi
```

If a harness isn't installed, the command prints the official install
instructions and exits without making changes.

Connecting and disconnecting is a CLI-only flow. The Syntax CLI is the
only harness with no `connect` step — it ships with Syntax and is
available immediately.

## What "connected" means concretely

A connected harness:

- Sends every chat request to the local Bridge instead of going directly
  to a provider.
- Inherits the active model policy (aliases, per-tier overrides,
  budgets).
- Streams tokens back in the wire format it expects.
- Can call any specialist tool the active deployment registers (when
  it's part of a multi-model party).

Multiple harnesses can be connected simultaneously without interfering
with each other.

## What this isn't

Connecting a harness to Syntax is **not** a fork or a plugin. The
harnesses are unmodified upstream binaries / extensions. Only their
own configuration files are edited, and only by `syntax connect`.
`syntax disconnect` is fully reversible.

---

# Pi

> Use Pi with Syntax.

Permalink: https://docs.syntax-ftc.com/docs/harnesses/pi

Pi is a terminal-first coding agent. Syntax connects to it via the
OpenAI-compatible Bridge.

## Connect

```bash
syntax connect pi
```

The connect flow:

- Locates Pi's configuration in the standard location for your OS.
- Rewrites the model endpoint to point at the local Bridge.
- Backs up the original configuration.

If Pi isn't installed, the command prints Pi's official install instructions
and exits.

## What works through Syntax

- Streaming, tool calls, and reasoning all pass through.
- Any model Syntax exposes is usable from Pi.
- Pi's session controls work unchanged.

## Disconnect

```bash
syntax disconnect pi
```

Restores Pi's original configuration.

## Where to start

- [Connecting a harness](/docs/getting-started/connecting-a-harness)
- [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility)

---

# Syntax CLI

> Syntax's own native coding agent — always available, no separate install.

Permalink: https://docs.syntax-ftc.com/docs/harnesses/syntax-cli

The **Syntax CLI** is the agent that ships as part of Syntax. It's a
full coding assistant in its own right and is always available without
a separate install or `syntax connect` step. The CLI uses the Bridge,
of course — like every other harness — but it also has a few
Syntax-native capabilities that aren't available through external
harnesses.

## Why use the Syntax CLI

You'd pick the Syntax CLI when you want:

- A coding agent that ships and updates with Syntax itself.
- The full Plan Mode workflow as a first-class CLI experience.
- The runtime-mode cycle (`Ctrl+M`) and explicit cancellation (`Esc`)
  with Syntax-native semantics.
- Native session persistence and fork/rollback behavior.
- Direct access to Syntax-specific tooling (background agents, cron
  scheduling within a session, team coordination, etc.).

## How it relates to the desktop app

The desktop app and the Syntax CLI share the same agent core. A
session started in the CLI can be resumed in the desktop app and vice
versa. Both talk to the Bridge for inference, both inherit your active
model policy, both observe the same approvals.

## Headless / scripted use

The Syntax CLI also has a headless mode for CI/CD, background tasks,
and scripted automation. In headless mode, the CLI runs without a TUI,
prompts are surfaced through structured I/O instead of interactive
input, and tool approval is handled by your configured policy rather
than per-call confirmation.

## Where to start

- Run `syntax-cli --help` for the full harness command list.
- [CLI → Syntax Coding Harness](/docs/cli/syntax-cli) for the
  interactive TUI and the headless `syntax-cli exec` flow.
- [CLI Reference](/docs/cli/overview) for the user-visible commands
  and flags.
- [Concepts → Plan Mode](/docs/concepts/plan-mode) and
  [Runtime Modes](/docs/concepts/runtime-modes) for the agent
  semantics.

---

# Hardware support

> What hardware Syntax runs on, and which capabilities each tier unlocks.

Permalink: https://docs.syntax-ftc.com/docs/inference/hardware-support

Syntax detects your hardware on first launch and chooses the right
serving stack for every model you deploy. This page summarizes what's
supported and what each tier unlocks.

## Per-machine support matrix

| Hardware | OS | LLM serving | Multimodal serving | CPU fallback |
|---|---|---|---|---|
| **NVIDIA GPU** (modern data-center, e.g., H100/H200/L40 class) | Linux / Windows | ✓ — full coverage | ✓ — including image, video, audio | n/a |
| **NVIDIA GPU** (modern consumer, e.g., RTX 40 / 50 series) | Linux / Windows | ✓ — most models | ✓ — many multimodal | n/a |
| **NVIDIA GPU** (older consumer, e.g., RTX 30 / 20 series) | Linux / Windows | ✓ — many models | partial | available |
| **Apple Silicon** (M1 / M2 / M3 / M4 + Pro / Max / Ultra) | macOS | ✓ — extensive | ✓ — many multimodal | n/a |
| **AMD ROCm** (RDNA 3 / CDNA 3 generation) | Linux | ✓ — most models | partial | available |
| **CPU only** (modern x86_64 / ARM64) | any | ✓ — smaller models | limited | primary |

## Memory guidance

For local LLM serving, plan disk and memory roughly as follows:

| Model size | Disk needed for weights | RAM (CPU) | VRAM (GPU) |
|---|---|---|---|
| ≤ 8B parameters | ~10–20 GB | 16 GB+ | 8–16 GB |
| 8–32B parameters | ~30–80 GB | 32 GB+ | 24–48 GB |
| 32–70B parameters | ~80–200 GB | 64 GB+ | 48–96 GB |
| ≥ 70B parameters | ~200 GB+ | 128 GB+ | 96 GB+ or multi-GPU |

These are guidelines; actual requirements depend on the model, the
quantization (when applicable), and the engine choice.

## Optional dependencies

| Dependency | When it's needed |
|---|---|
| **Docker** | Optional. Recommended on Linux when you're running engines that ship as containers. The desktop app guides you through enabling Docker if it's not present. |
| **NVIDIA driver** | Required on NVIDIA hardware. Syntax expects a recent driver; `syntax doctor` will warn if the version is too old. |
| **ROCm runtime** | Required on AMD hardware. Syntax detects ROCm at first launch and falls back to CPU if it's missing. |

## Multi-GPU

Multi-GPU is supported on Linux for both NVIDIA and AMD where the
underlying engine and the chosen model support tensor- or pipeline-
parallel serving. Syntax's autotuner sets the parallelism strategy
based on the model and the available GPUs without you having to pick.

## Multi-host

Multi-host deployments are supported via the
[Remote self-hosted](/docs/inference/remote-self-hosted) and
[Managed remote](/docs/inference/managed-remote) targets. For local
multi-host workflows, treat each host as a remote target and deploy
the party across them.

## Where to go next

- [Local inference](/docs/inference/local-inference) — running models
  on the machine in front of you.
- [Multi-engine inference](/docs/differentiators/multi-engine-inference)
  — why Syntax picks the engine it does.
- [Multimodal capabilities](/docs/inference/multimodal) — image,
  video, audio, 3D, time-series forecasting.

---

# Local inference

> Running models on your own machine — GPU, Apple Silicon, or CPU.

Permalink: https://docs.syntax-ftc.com/docs/inference/local-inference

Local inference runs models on the machine Syntax is installed on. It
works on everything from a CPU-only laptop to a multi-GPU workstation.

## What's supported

| Hardware | Engine class | Notes |
|---|---|---|
| **NVIDIA GPU (Linux)** | GPU-serving engine tuned for the architecture. | Best supported — most open-weight LLMs and multimodal models work. |
| **NVIDIA GPU (Windows)** | GPU-serving engine. | Same coverage as Linux, modern driver required. |
| **Apple Silicon** | Native Apple Metal stack. | Excellent for M-series Macs; no container or driver overhead. |
| **AMD ROCm** | GPU-serving engine for compatible cards. | Supported for current cards; check the catalog for per-model status. |
| **CPU only** | Lightweight CPU serving engine. | Smaller models only. Eligible larger models can also fall back here when GPU VRAM is exhausted by co-tenants. |

## Picking what to run locally

The desktop app's **Catalog** page shows recommended models for your
detected hardware tier. Cards expose:

- **Download Locally** — pull weights to your machine.
- A clear indicator if the model won't fit on your hardware so you can
  pick a smaller variant.

Once a model is downloaded, it's available for deployment from the
**Deployments** page.

## Deploying a single model locally

1. Open **Deployments → New Deployment**.
2. Pick a category (Chat, General, Coding, Media, Vision, Custom) or
   pick **Custom** to compose your own.
3. Choose **Local** as the target.
4. Pick a deployment **Mode** (Latency or Throughput).
5. Submit.

Syntax's autotuner picks the right engine and parameters for your
hardware automatically. The deployment shows up on the **Active
Deployments** page once it's serving.

## Deploying a party locally

Multi-model parties deploy through the same flow. The Party Builder
generates a plan that fits the whole party on your local hardware,
relieving VRAM pressure by role tier when needed (see
[Inference → Overview](/docs/inference/overview)).

## When local isn't enough

- **VRAM-bound** by a model larger than your GPU can hold → consider a
  smaller variant, a quantized version, or routing to a hosted
  provider for that model.
- **Throughput-bound** by sustained heavy load → consider remote
  self-hosted or managed remote.
- **Cold-start sensitive** when you need a model rarely → routing to a
  hosted provider is often the right answer.

## Where to go next

- [Hardware support](/docs/inference/hardware-support) — full hardware
  matrix.
- [Multi-engine inference](/docs/differentiators/multi-engine-inference)
  — why Syntax picks the engine it does.
- [Concepts → Party Builder](/docs/concepts/party-builder) — deploying
  multiple models locally as one party.

---

# Managed remote (dUX)

> dUX-backed cloud GPU. Pick a model, pick a tier, deploy. dUX handles placement, autoscaling, drivers, and ingress.

Permalink: https://docs.syntax-ftc.com/docs/inference/managed-remote

Managed remote is the path for teams that want cloud GPU without
managing infrastructure. It is available to all users and uses
**dUX** to orchestrate the hardware inside your own cloud accounts —
you remain the sole admin of the underlying machines.

## How it works (at a glance)

1. You pick a model (or a party) and a target tier in the desktop app.
2. Syntax submits the deployment intent to dUX.
3. dUX handles the cloud-side work: GPU placement, autoscaling, driver
   compatibility, ingress, isolation.
4. dUX returns the endpoint(s).
5. Syntax wires those endpoints into the Bridge.
6. Your harness sees the managed-remote deployment as a normal model
   in its model list and routes transparently.

## Target tiers

Two managed-remote tiers map to two optimization profiles:

| Tier | Optimized for |
|---|---|
| **Latency** | Lowest time-to-first-token, lowest per-request latency. |
| **Throughput** | Highest tokens-per-second under load, best cost-per-token. |

The exact placement strategy and replica policy live in dUX's
orchestration layer; from your perspective, you choose Latency or
Throughput and dUX handles the rest.

## Saved remote targets

The first time you deploy a model managed remotely, you can choose to
**save the target** — name, tier, exposure, replica policy, and
anything else you configured. Future deploys to the same logical
target are one click.

## Public vs private endpoints

When you deploy managed remote, you can set:

- **Expose private** — the endpoint is reachable from your other
  Syntax tools but not from the public internet.
- **Expose public** — the endpoint is reachable from anywhere with the
  bearer token, suitable for sharing with non-Syntax tools.

Both surfaces issue a per-deployment bearer token (`sk-syntax-…`)
that's scoped to the deployment and can be revoked at any time. See
[Concepts → Exposed endpoints](/docs/concepts/exposed-endpoints) for the
revocation flow.

## What you don't have to think about

- GPU drivers and CUDA / ROCm versions.
- Autoscaler configuration (KEDA, DCGM, etc.).
- Kubernetes namespaces.
- Ingress and load balancing.
- Multi-replica weight distribution.
- Node-pool capacity planning.

dUX orchestrates all of that for you, inside your own cloud accounts,
with you as the sole admin. Syntax stays your control surface.

## Multi-model parties on managed remote

The same Party Builder that composes parties for local deployment can
also deploy them to managed remote. dUX returns placements and Syntax
wires every model in the party into the Bridge so the Main Agent can
call its specialists transparently.

## Where to go next

- [Syntax × dUX → Overview](/docs/dux-integration/overview)

---

# Multimodal capabilities

> Image, video, audio, 3D, UI grounding, OCR, and time-series forecasting — all reachable through the same Bridge.

Permalink: https://docs.syntax-ftc.com/docs/inference/multimodal

Syntax isn't limited to text. The catalog includes models for a wide
range of modalities, and the Bridge exposes them as tools the main
agent can invoke alongside text generation.

## Supported modalities

| Modality | Examples of what's supported |
|---|---|
| **Text generation** | Chat, code, reasoning, structured outputs. |
| **Embedding** | Sentence and code embeddings for semantic search. |
| **Reranking** | Listwise reranking for retrieval pipelines. |
| **Image understanding** | Vision-language models that look at images and answer questions. |
| **OCR** | Optical character recognition. |
| **Image processing** | Style transfer, restoration, adjustment. |
| **Image generation** | Text-to-image, image-to-image diffusion. |
| **Video processing** | Temporal segmentation, video Q&A. |
| **Video generation** | Text-to-video, image-to-video. |
| **Segmentation** | Image and video segmentation. |
| **TTS (text-to-speech)** | High-quality speech synthesis. |
| **Audio generation** | Music and effect generation, V2A Foley. |
| **Audio transcription** | Speech-to-text. |
| **Speech-to-speech** | Voice transformation, style transfer. |
| **Mesh recovery** | 3D mesh from images or video. |
| **UI grounding** | Locate UI elements in screenshots. |
| **Time-series forecasting** | Foundation-model forecasting (Chronos, TimesFM, MOMENT, Granite-TTM, etc.). |

## How multimodal capabilities surface to your harness

Each multimodal capability is exposed as a **tool** the main agent can
invoke. When a multimodal model is deployed, the Bridge registers its
capability — `generate_image`, `transcribe_audio`,
`segment_image`, `text_to_speech`, etc. — so the main agent can pick
the right tool when the user's request needs it.

The capability set is **dynamic**: it's recomputed every time you
deploy or undeploy. If you have an image generator deployed today and
remove it tomorrow, the agent stops seeing `generate_image` as an
available tool.

## Engine selection for multimodal

Each modality is served by the engine class best suited to it:

- LLMs and vision-language models run on GPU-serving engines.
- Image and video generation run on diffusion-friendly engines.
- Specialized non-LLM models (OCR, segmentation, TTS, audio
  generation, mesh recovery, UI grounding, time-series forecasting)
  run on a serving framework optimized for those workloads.
- On Apple Silicon, the Apple-native stack handles eligible models.

The autotuner picks all of this for you — see
[Multi-engine inference](/docs/differentiators/multi-engine-inference).

## Where to go next

- [Models → Modalities](/docs/models/modalities) — modality-by-modality
  capability summary.
- [Models → Purposes](/docs/models/purposes) — the full list of
  Model Purpose categories.
- [Concepts → Party Builder](/docs/concepts/party-builder) — adding
  multimodal specialists to a party.

---

# Inference overview

> How Syntax serves models — local, remote self-hosted, managed remote on dUX, and hosted providers.

Permalink: https://docs.syntax-ftc.com/docs/inference/overview

Every model Syntax exposes ends up running somewhere. The four
**inference targets** are:

| Target | Where it runs | Best for |
|---|---|---|
| **Local** | Your machine — GPU, Apple Silicon, or CPU. | Solo workflows, privacy-first, no network dependency. |
| **Remote self-hosted** | A box you've provisioned (your server, your GPU, your SSH). | Power users with their own hardware. |
| **Managed remote (dUX)** | dUX-managed cloud GPU. | Teams that want managed infrastructure. |
| **Hosted provider** | OpenAI, Anthropic, Google, etc. | Frontier models, predictable cost, no infra. |

All four are reachable through the same Bridge endpoint. Your harness
doesn't know — or care — which one is serving any given request.

## How Syntax decides what to run

Two layers make the decision:

1. **Routing.** When a request arrives at the Bridge, the active model
   policy picks which deployment serves it. If a model is deployed in
   multiple places (e.g., locally and on managed remote), routing
   picks based on your preferences.

2. **Engine selection.** For local and remote-self-hosted serving,
   Syntax's **autotuner** picks the most efficient serving engine
   for the chosen model and your hardware — see
   [Differentiators → Multi-engine inference](/docs/differentiators/multi-engine-inference).

You can override either layer. Aliases let you pin a name to a specific
deployment; per-deployment configuration lets you override engine
choices when you need to.

## Multi-model deployments

When you deploy a multi-model party — a Main Agent, a Default
Sub-Agent, and up to six Specialists — the inference plane plans
holistically:

- All models in the party share the same target (local, self-managed
  remote, or managed remote — but not mixed).
- The autotuner places each model on the available hardware in role
  order so the Main Agent gets the best resources.
- VRAM pressure is relieved by tier when needed: specialists first,
  the sub-agent second, the Main Agent only as a last resort. Eligible
  smaller models can fall back to CPU automatically.

## Targets in depth

- [Local inference](/docs/inference/local-inference) — GPU / Apple
  Silicon / CPU on your own machine.
- [Remote self-hosted](/docs/inference/remote-self-hosted) — your own
  SSH-reachable hardware.
- [Managed remote](/docs/inference/managed-remote) — dUX-backed cloud
  GPU.
- [Hardware support](/docs/inference/hardware-support) — what runs on
  what.
- [Multimodal capabilities](/docs/inference/multimodal) — image,
  video, audio, 3D, time-series forecasting.

---

# Remote self-hosted

> Run models on your own remote box — your server, your GPU, your SSH — with Syntax handling the lifecycle.

Permalink: https://docs.syntax-ftc.com/docs/inference/remote-self-hosted

Remote self-hosted is for users who already have a GPU server (a beefy
home tower, a colo box, a cloud VM you provisioned yourself) and want
Syntax to drive it without giving up control of the machine.

## What "remote self-hosted" means

You provide:

- An SSH-reachable host with the right hardware (GPU, RAM, disk).
- An account / key Syntax can use to log in.

Syntax handles:

- Engine installation on the remote host (curated images, no manual
  driver wrangling beyond the GPU driver itself).
- Model weight delivery.
- Engine lifecycle (start, stop, health checks).
- Wiring the resulting endpoint into the local Bridge so your harness
  reaches the remote model the same way it reaches a local one.

## Setting up a remote target

From **Settings → Remote Targets** in the desktop app:

1. Add the host (hostname, port, username).
2. Provide an SSH key that's authorized on the host.
3. Test the connection — Syntax verifies it can reach the host, has
   access to the right paths, and can probe the GPU.
4. Save.

Once a remote target is saved, deploying a model to it is the same
flow as a local deployment — just pick **Self-Managed Remote** as the
target.

## Disk layout on the remote host

Syntax keeps remote artifacts under a small set of well-known paths in
your home directory on the remote host. Weights, engine binaries, and
log files all live in predictable locations so they're easy to clean
up if you ever decide to remove Syntax.

## Engine selection on the remote host

The same multi-engine inference logic that runs locally also runs on
the remote host: Syntax picks the right engine for the model and the
remote hardware. You don't have to install or manage CUDA, ROCm,
attention backends, or quantization toolchains by hand.

## Multi-host remote deployments

Some models are too large to fit on a single host. For multi-host
deployments, you provide multiple remote targets and pick a
**Strategy**:

- **Performance** — one model per host (lowest latency).
- **Economy** — pack onto the fewest hosts (lowest cost).

The strategy applies to multi-model parties only; single-host targets
ignore it.

## When to use remote self-hosted

- You already own a GPU server.
- You want full control of the OS and drivers.
- You want SSH-level visibility into the running process.
- You don't want managed cloud GPU pricing or vendor lock-in.

## When to use managed remote (dUX) instead

- You don't have hardware and don't want to provision and maintain it.
- You want autoscaling without writing it yourself.
- Your team needs shared deployments behind a single endpoint.

→ [Managed remote](/docs/inference/managed-remote)

---

# How Syntax works

> A high-level walkthrough of Syntax's three planes, the Bridge, and what happens to a request from your editor to a model and back.

Permalink: https://docs.syntax-ftc.com/docs/introduction/how-it-works

Syntax is built around three planes that work together but stay clearly
separated. You don't have to understand every layer to use Syntax — but the
mental model below is enough to predict what will happen for any given
configuration.

## The three planes

| Plane | What it owns |
|---|---|
| **Control** | Identity, organization policy, secrets, budgets, audit logs. |
| **Execution** | Your sessions, the harness lifecycle, the local proxy, approvals, tool orchestration. |
| **Inference** | The model catalog, hardware detection, engine selection, autotuning, model lifecycle. |

The three planes are deliberately decoupled. The control plane never sees the
content of your sessions; the execution plane never has to think about how a
model is autotuned for your specific GPU; the inference plane never reaches
into your editor.

## First-class inter-compatibility

Every supported coding assistant talks to a single OpenAI- and
Anthropic-compatible endpoint on `localhost`. That endpoint is the **Bridge**
— the piece of Syntax that accepts requests in the format your harness already
speaks and routes them to the right backend.

## What happens to a request

1. Your harness sends a chat request to its configured endpoint, which is
   actually Syntax's local Bridge.
2. The Bridge resolves the requested model name against your active model
   policy (alias resolution, tier overrides, budget caps).
3. The Bridge picks a backend — local engine, remote self-hosted engine,
   dUX-managed remote, or a hosted provider — based on what's deployed and
   what your policy allows.
4. The chosen backend serves the request. Local serving uses the most
   efficient engine for your hardware (see [Multi-engine
   inference](/docs/differentiators/multi-engine-inference)).
5. Tokens stream back to your harness in the wire format it expects, so
   streaming, tool calls, reasoning, and multimodal content all render
   correctly.

## What you control

Syntax exposes a small set of high-leverage knobs:

- **Model policy** — which models are allowed for which tiers, with aliases
  and per-deployment overrides.
- **Routing strategy** — Latency vs Throughput, Performance vs Economy on
  multi-host deploys, public vs private endpoint exposure.
- **Approvals** — what tool calls your harness is allowed to run without
  asking, and how risky operations get gated.
- **Budgets** — hard caps and soft warnings for tokens or compute, per user
  and per organization.

## Where this plays out

- The Bridge — what it is and why every harness talks to it — is
  covered in
  [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility).
- The catalog and inference engines are covered in
  [Inference](/docs/inference/overview).
- The dUX-backed managed remote story is in
  [Syntax × dUX](/docs/dux-integration/overview).

---

# What is Syntax?

> Syntax is your fully managed, privately owned, general-purpose AI factory.

Permalink: https://docs.syntax-ftc.com/docs/introduction/what-is-syntax

## The Problem Syntax Solves

AI usage is exploding. Whether you use AI for coding, as a personal assistant,
for media generation, or to power novel agentic products, compute and token
costs add up quickly. As models become more capable, utilizing them at scale
becomes increasingly expensive.

As businesses increasingly integrate AI agents into their core workflows, and
with per-task token consumption being highly unpredictable, they introduce a
volatile and escalating OpEx component that can threaten the viability of
entire business models.

Privacy and compliance present another major hurdle. With foundational model
providers expanding into diverse domains, from healthcare to legal to finance,
companies risk exposing proprietary data, code, and business logic, which
could inadvertently be used to train a competitor's model. Additionally,
strict data sovereignty laws often prohibit organizations from passing
sensitive information through off-the-shelf commercial APIs.

Finally, efficiently deploying and scaling the latest high-performance OSS
models is a complex engineering challenge. Even with the assistance of AI,
building out this infrastructure remains inaccessible to many companies
lacking specialized MLOps and systems-level talent.

## Syntax's Solution

The integration of Syntax and dUX addresses these three core issues from the
ground up.

Syntax operates as a built-in application within dUX, designed to instantly
and effortlessly deploy OSS and proprietary AI on local, private, or fully
managed compute resources. It supports a vast catalog of models across dozens
of categories, each pre-configured for either low-latency or high-throughput
efficiency across a wide array of hardware architectures. Syntax natively
supports high-performance deployments via SGLang, vLLM, Triton, llama.cpp,
and Whisper.cpp, automatically routing each model to its most suitable
inference engine, with automatic scaling from 0 to infinity already wired-in.

Unlike token-based APIs, dUX charges exclusively for the hourly usage of
underlying compute resources. Billing is completely decoupled from token
consumption or API calls, insulating you from the unpredictable behavior of
autonomous AI agents. You pay only the base infrastructure costs of the
chosen cloud provider + dUX's premium, with the Syntax platform itself
provided at zero additional costs!

Because costs are strictly capped by the hourly rate of the provisioned
hardware, businesses can accurately forecast their cloud expenditures,
transforming unpredictable OpEx into a known, manageable expense.

Furthermore, Syntax and dUX guarantee absolute privacy. Clients act as the
sole administrators of any provisioned resources. Native integration with
your own secrets management systems creates a strict technical guarantee —
rather than a reliance on vendor trust — ensuring that neither our team nor
any third party can access your infrastructure or data.

Finally, by allowing clients to select specific cloud providers and regions,
and by fully supporting deployments on private, on-premise, or even
air-gapped infrastructure, Syntax enables organizations to harness frontier
AI capabilities while maintaining strict compliance with all local and
industry regulations.

## Where to Start

- New to Syntax? Continue with [How it works](/docs/introduction/how-it-works).
- Ready to install? Pick your platform under
  [Getting Started](/docs/getting-started/install-macos).

---

# Why Syntax?

> The reasons teams pick Syntax over rolling their own AI deployment stack.

Permalink: https://docs.syntax-ftc.com/docs/introduction/why-syntax

There are plenty of ways to call an LLM from a coding tool or product.
The reason teams pick Syntax — and stay on it — is that it solves a
handful of hard problems together that are usually solved separately
and badly.

## 1. Predictable Costs

dUX bills hourly for compute, not per token. Billing is decoupled from
token consumption entirely, so autonomous agents can't quietly run up
a bill — costs are capped by the hourly rate of the hardware you've
provisioned. The Syntax platform itself is provided at zero additional
cost; you pay the underlying provider plus dUX's premium and nothing
else.

## 2. Private & Secure

Managed remote deployments run on WireGuard-based internal networks.
Endpoints are not reachable from the public internet unless you
explicitly opt in by issuing a public exposed bearer; private
exposures stay entirely within your perimeter.

Neither Syntax nor dUX will ever access your machines, data, logs,
files, or code. Integrate dUX with your own secrets manager and that
becomes a *technical* guarantee — not a vendor trust statement —
that no one outside your org can reach your infrastructure.
Enterprise tenants run in fully isolated environments; on-premise and
air-gapped deployments are first-class.

## 3. Optimized deployments

Choosing how to run a given model is a real engineering problem. The
right combination of hardware SKU, cloud instance type, serving
engine, attention backend, quantization format, and parallelism
strategy differs for every model — and a model that's "supported" by
two different engines may only run well on one of them. Syntax's
autotuner navigates that decision space for you, across the catalog
and across the available hardware.

When you deploy a multi-model party, the same autotuner plans the
whole party: packing co-tenants where it saves cost, isolating models
that would harm each other's latency, and propagating your
Performance vs Cost-optimized tier across every model. You choose a
tier and a target; everything below is handled automatically and
scales from zero to whatever sustained traffic demands.

→ [Differentiators → Multi-engine inference](/docs/differentiators/multi-engine-inference)

## 4. Use the harness you already love

Syntax doesn't ask you to switch editors or learn a new IDE. Codex,
Claude Code, OpenCode, Pi, and the Syntax-native `syntax-cli` all
work out of the box. The integration is reversible: `syntax connect`
edits the harness's own configuration to point at Syntax, and
`syntax disconnect` puts it back exactly the way it was.

The point isn't just compatibility — it's that you can keep the tool
you already trust while shifting the workload underneath it onto
cost-efficient OSS models you deploy yourself. The harness doesn't
care; you get the same UX with a fraction of the per-task cost.

→ [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility)

## 5. Pick the best model for the job

A real workflow needs a strong main model, a cheaper sub-agent, and
sometimes specialists for things like image understanding, OCR,
search, embeddings, or image generation. Syntax's **Models Party**
lets you compose those into a single deployment with one main agent,
one default sub-agent, and up to six specialists — and the main model
can call specialists as tools.

This is where the cost story compounds. Most "frontier model"
workloads are actually a mix of simple, routine, and genuinely hard
sub-tasks. A well-composed party routes the simple and routine work
to small, cheap models and reserves frontier capacity for the small
fraction of tasks that actually need it.

→ [Differentiators → Multi-model parties](/docs/differentiators/multi-model-parties)

## 6. Scale to the cloud without managing infrastructure

For team and production workloads, the same model name you ran
locally resolves on managed remote infrastructure via dUX, with
autoscaling from zero to whatever traffic demands already wired in
across dozens of public cloud providers. dUX provisions the hardware
inside your own cloud accounts — you remain the sole admin, and you
describe what you want; dUX handles placement, drivers, autoscaling,
ingress, and lifecycle.

→ [Syntax × dUX → Overview](/docs/dux-integration/overview)

## And one bonus: AI-agent-friendly

These docs ship as a normal website *and* as a Markdown corpus an AI
agent can ingest in one fetch. Every page has a raw-Markdown sibling
URL. There's a [`llms.txt`](/llms.txt) index, a [full
corpus](/llms-full.txt), and a [JSON sitemap](/api/sitemap.json).
When an agent in your codebase needs to reason about Syntax's
capabilities, point it here.

→ [Differentiators → AI-agent-friendly](/docs/differentiators/ai-agent-friendly)

---

# Catalog overview

> Hundreds of models across many purposes, all reachable through the same Bridge.

Permalink: https://docs.syntax-ftc.com/docs/models/catalog-overview

The **catalog** is Syntax's curated set of models. It includes
hundreds of open-weight and provider-hosted models across every model
purpose Syntax supports — text generation, embedding, image
generation, video generation, OCR, segmentation, TTS, audio
generation, mesh recovery, UI grounding, audio transcription,
speech-to-speech, image processing, video processing, reranking, and
time-series forecasting.

## What's in the catalog

Each catalog entry includes everything Syntax needs to serve the model
without you having to wire it up by hand:

- The model's identity (name and provenance).
- Its model purpose (e.g., text generation, image generation, OCR).
- Its modalities (text, image, video, audio).
- Recommended serving parameters for the engines Syntax can use.
- The model's license, so attribution and usage requirements are
  surfaced before you deploy.

When a model is in the catalog, deploying it is a one-click operation
on any supported target — local, self-managed remote, managed remote,
or routed to a hosted provider.

## Open-weight vs provider-hosted

The catalog mixes two kinds of models:

- **Open-weight** models that Syntax can serve directly on your local
  hardware, on a self-hosted remote, or on managed remote (dUX). The
  weights are downloaded and served by Syntax.
- **Provider-hosted** models that Syntax routes to (OpenAI, Anthropic,
  Google, etc.). You bring the API key; Syntax handles the routing,
  alias resolution, and budget tracking.

A single model can have both faces — for example, a frontier model
reachable through a hosted provider and an open-weight equivalent you
can run locally — and Syntax routes per session based on your
preferences.

## How to browse

The desktop app's **Catalog** page is the primary surface for
browsing models. It supports:

- Search by name or capability.
- Filter by model purpose, modality, size, license, and curated tier.
- Sort by recency, parameter count, or curated rank.

The Catalog page is read-only — you browse here, then move to the
**Deployments** page (or the Party Builder) when you're ready to
deploy.

## Per-model deployment options

Every catalog entry exposes:

- **Download Locally** when local serving is supported on your
  hardware.
- **Download Remotely** when you have a self-managed remote target
  configured.
- **Serverless activation** when the model is reachable through a
  hosted provider you've configured.

Whether a button is enabled depends on your hardware tier and your
configured providers — Syntax greys out options that won't work
rather than letting you pick something that will fail.

## Where to go next

- [Models → Purposes](/docs/models/purposes) — the model purpose taxonomy.
- [Models → Modalities](/docs/models/modalities) — what each modality
  means in practice.
- [Models → Reasoning models](/docs/models/reasoning-models) — how
  reasoning effort flows through the Bridge.
- [Models → Tool use](/docs/models/tool-use) — how tool calls work
  across model families.
- [Models → Licensing](/docs/models/licensing) — license display and
  attribution.

---

# Licensing & attribution

> Every model in the catalog declares its license. Syntax surfaces it before you deploy and at runtime.

Permalink: https://docs.syntax-ftc.com/docs/models/licensing

The Syntax catalog is currently restricted to models whose licenses
permit general commercial use, with a strong preference for models
under permissive licenses such as MIT or Apache 2.0.

Every model in the catalog declares its license. Syntax surfaces this
information at three points so you always know the license you're
working under.

## Where you see the license

1. **At browse time.** The Catalog page shows the license on every
   model card. You can also filter by license family.
2. **At deploy time.** Before you confirm a deployment, Syntax shows
   the license again with any usage notes (e.g., research-only,
   commercial-use restrictions, redistribution rules).
3. **At runtime.** Each deployed model's status page shows its
   license and any attribution requirements. Models that require
   visible attribution badges in shipped products surface that
   requirement clearly.

## EULA gates

Some models ship under an end-user license agreement that requires
explicit acceptance before download. For those, Syntax displays the
EULA and waits for you to accept before pulling weights. The
acceptance is recorded so you don't have to re-accept on subsequent
deployments of the same model.

## Provider terms

For provider-hosted models (OpenAI, Anthropic, Google, etc.), the
relevant terms are the provider's own. Syntax surfaces a link to the
provider's usage policy on the catalog card and during deployment.

## Per-model READMEs

When a model has an upstream README — for example, on Hugging Face —
Syntax downloads it alongside the weights. The README is reachable
from the deployed model's status page, so the canonical model card
is right there if you need to check details.

## Why this matters

License surface area is real. A model that's permissive for research
but restrictive for commercial use, or that requires attribution in
the resulting product, or that excludes specific use cases, can
become a compliance issue if it's deployed without that information
visible. Syntax makes the license a first-class piece of metadata so
the right team can see it before the weights ever get downloaded.

## Where to go next

- [Catalog overview](/docs/models/catalog-overview)

---

# Modalities

> Text, image, video, and audio — what each modality means in Syntax and how multimodal models surface to your harness.

Permalink: https://docs.syntax-ftc.com/docs/models/modalities

Where **purpose** is what a model is for, **modality** is what kind of
data the model accepts and emits. Models can be unimodal (text only)
or multimodal (text + image, or text + audio, etc.).

## The four common modalities

| Modality | Meaning |
|---|---|
| **Text** | Tokens — chat, code, structured outputs. |
| **Image** | Still images, in or out. |
| **Video** | Sequences of frames, in or out. |
| **Audio** | Audio waveforms — speech and non-speech. |

A vision-language model is "text + image" in. A diffusion image
generator is "text" in, "image" out. A speech-to-speech model is
"audio" in, "audio" out. A multimodal LLM might accept all four.

## How multimodal LLMs work through Syntax

When you deploy a multimodal LLM (text + image, text + audio, etc.):

- The model is registered with its declared modalities.
- The Bridge accepts content blocks (image URLs, base64-encoded
  images, audio chunks) in the appropriate API surface and routes
  them to the model.
- Streaming, tool calls, and reasoning continue to work alongside
  multimodal input.

If your harness sends a multimodal request to a unimodal model, the
Bridge returns a clear error rather than silently dropping the
non-text content.

## How non-LLM multimodal models work through Syntax

Models with non-LLM modalities — image generators, OCR, segmenters,
TTS, audio generators, mesh recovery, UI grounding, time-series
forecasting — surface as **tools** on the main agent rather than
chat-completion targets.

Concretely: when an image generator is deployed, the main agent sees
a `generate_image` tool. When the user's request needs an image, the
agent calls the tool, the tool runs the model, and the result is
folded back into the conversation. The same pattern applies to every
non-LLM modality.

## Capability scoring in the Party Builder

The Party Builder uses modality coverage as part of its capability
scoring. When you compose a party, you can see at a glance:

- Which input modalities your party can handle.
- Which output modalities your party can produce.
- Where there are gaps — for example, "no image generation in this
  party" or "no audio transcription".

Picking a specialist that closes a gap is a single click.

## Where to go next

- [Models → Purposes](/docs/models/purposes) — the purpose taxonomy.
- [Inference → Multimodal capabilities](/docs/inference/multimodal) —
  what each modality looks like at runtime.
- [Concepts → Party Builder](/docs/concepts/party-builder)

---

# Model purposes

> The coarse classification Syntax uses to know how each model should be served.

Permalink: https://docs.syntax-ftc.com/docs/models/purposes

A **model purpose** is the coarse classification of what a model is
for. Syntax uses model purpose to decide:

- Which serving engine is right for the model.
- Whether the model surfaces as a tool to the main agent (and which
  tool name).
- Which UI surfaces should expose it (Catalog filters, Party Builder
  capability scoring, etc.).

## The current purposes

| Purpose | What it is |
|---|---|
| **Generation** | Text generation — the LLMs your harness chats with. |
| **Embedding** | Sentence and code embeddings for retrieval. |
| **Reranking** | Listwise reranking on top of retrieval results. |
| **OCR** | Extract text from images. |
| **ImageProcessing** | Style transfer, restoration, adjustment. |
| **VideoProcessing** | Temporal segmentation and video Q&A. |
| **ImageGeneration** | Text-to-image and image-to-image diffusion. |
| **VideoGeneration** | Text-to-video and image-to-video. |
| **Segmentation** | Pixel-precise segmentation for images and video. |
| **TTS** | Text-to-speech synthesis. |
| **AudioGeneration** | Music, effects, and V2A Foley. |
| **MeshRecovery** | 3D mesh from images or video. |
| **UIGrounding** | Locate UI elements in screenshots. |
| **AudioTranscription** | Speech-to-text. |
| **SpeechToSpeech** | Voice transformation, style transfer. |
| **TimeSeriesForecasting** | Foundation-model time-series forecasting. |

## Why this matters

When you compose a multi-model party, the Party Builder uses model
purposes to:

- Show capability coverage — which purposes your party covers and
  where there are gaps.
- Suggest specialists for purposes the main agent doesn't cover well.
- Register the right tool name so the main agent can invoke each
  specialist correctly (`generate_image`, `transcribe_audio`,
  `segment_image`, etc.).

When you deploy a single model, the model purpose determines which
serving engine Syntax picks — see
[Differentiators → Multi-engine inference](/docs/differentiators/multi-engine-inference).

## How a new purpose lands

The set of purposes is expandable; new purposes are added as the model
ecosystem grows. The most recent addition is **TimeSeriesForecasting**,
covering foundation-model time-series forecasters. When a new purpose
ships, it lights up automatically in the Catalog filters, the Party
Builder capability scoring, and the engine routing.

## Where to go next

- [Catalog overview](/docs/models/catalog-overview)
- [Models → Modalities](/docs/models/modalities) — modality vs. purpose.
- [Differentiators → Multi-model parties](/docs/differentiators/multi-model-parties)

---

# Reasoning models

> How reasoning effort flows through Syntax — three distinct mechanisms behind one consistent control.

Permalink: https://docs.syntax-ftc.com/docs/models/reasoning-models

Modern frontier models support **reasoning** — explicit thinking
before answering. Different model families implement reasoning
differently, and Syntax normalizes those differences behind one
consistent control so your harness doesn't have to know.

## The three reasoning mechanisms

| Mechanism | Used by | What "reasoning effort" maps to |
|---|---|---|
| **Native API reasoning** | Provider-native APIs that already expose a reasoning field (e.g., the Anthropic and Google reasoning fields, OpenAI's `reasoning.effort`, DeepSeek's native think modes). | Directly forwarded to the provider in the format they expect. |
| **Mechanism A** | Open-weight models that expose an explicit "thinking" toggle and budget through their chat template. | Translated into the model's native thinking control plus a budget appropriate to the model's context length. |
| **Mechanism B** | Models without a native thinking control. | An orchestrator runs a planning → execution → critique → repair → verify loop on top of the model. |

## What you see in the harness

Across all three mechanisms, your harness sets a **reasoning effort**
(low / medium / high). Syntax translates that into whatever the
specific model's family expects and the response comes back with the
reasoning fields intact. The harness doesn't have to know which
mechanism is in use.

## Reasoning enabled by default

When a model is deployed and the catalog declares it as a reasoning
model, Syntax enables reasoning by default at sensible levels:

- **Mechanism A** models get their native thinking toggle on, with a
  budget appropriate to the model's context length (long-context
  variants get a larger budget).
- **Mechanism B** models get the orchestrator at "high" effort.
- **Native-API reasoning** models get the provider's "high" or
  equivalent setting.

You can override at any level — per session, per request, or per
deployment.

## Reasoning + tool calls

Reasoning and tool calls compose. Reasoning models can plan, call
tools, and incorporate tool results back into their reasoning before
answering. The Bridge preserves both the reasoning and the tool-call
fields end-to-end.

## Where to go next

- [Models → Tool use](/docs/models/tool-use)
- [Concepts → Plan Mode](/docs/concepts/plan-mode) — a different
  flavor of structured reasoning, applied to the whole task.
- [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility)
  — how the Bridge surfaces the reasoning channel through Chat
  Completions and Messages.

---

# Tool use

> How tool calls work through Syntax — across providers, across engines, across model families.

Permalink: https://docs.syntax-ftc.com/docs/models/tool-use

Tool use — the model deciding to call a function and the harness
executing it — is a core part of any modern coding workflow. Syntax
normalizes tool use across model families so your harness sees a
consistent interface.

## What works through the Bridge

When your harness sends a request with tools defined, you get:

- **Auto tool choice.** The model picks when to call tools and when
  to answer directly.
- **Streamed tool calls.** Tool-call deltas stream as they're
  generated; your harness can render or execute them as they arrive.
- **Tool-call preservation across turns.** Tool calls and their
  results are preserved in conversation history so the model can
  reason about them later.
- **Mixed content.** A response can mix natural-language text and
  tool calls in one turn.

## Across model families

Different model families expose tool calling differently. Some use
OpenAI-style function calls, some use Anthropic-style tool blocks,
some use a model-specific chat template. Syntax handles the
translation:

- **Provider-hosted models** (OpenAI, Anthropic, Google, etc.) get
  their native tool-call format.
- **Open-weight models** with a native tool-call chat template get
  their template's expected format. Where applicable, the Bridge
  also enables the engine's auto-tool-choice path so the model
  reliably emits structured tool calls instead of free-form text.
- **Open-weight models without** a native tool-call template fall
  back to a structured-output approach that produces parseable tool
  calls regardless.

The harness sees the same OpenAI- or Anthropic-shaped tool calls
either way.

## Specialists as tools

When you deploy a multi-model party, every specialist is registered
as a tool the main agent can call.

## Approvals on tool calls

Tool-call execution is gated by your active **Runtime Mode**. In
Default, every meaningful tool call asks for confirmation before
running; in AutoEdit, common edits and routine commands run
unattended; in Bypass, everything runs without asking. See
[Concepts → Runtime Modes](/docs/concepts/runtime-modes).

## Where to go next

- [Models → Reasoning models](/docs/models/reasoning-models) —
  reasoning composes with tool use.
- [Concepts → Specialist Models](/docs/concepts/specialist-models) — how
  specialists become tools.
- [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility)
  — how the Bridge surfaces tool definitions through Chat Completions
  and Messages.

---

# Linux

> Platform notes for running Syntax on Linux.

Permalink: https://docs.syntax-ftc.com/docs/platforms/linux

Syntax runs on x86_64 and aarch64 Linux. Tested on Ubuntu 20.04+,
Debian 11+, and Fedora 38+; works on most modern glibc-based
distributions.

## What works on Linux

- The full desktop app (Wayland and X11) and CLI.
- Local inference on NVIDIA GPUs — the smoothest GPU path.
- Local inference on AMD ROCm — supported for compatible cards.
- Local inference on CPU for smaller models.
- All seven coding harnesses.
- Self-managed remote and managed-remote inference.
- Multi-GPU on a single host where the model and the engine support
  it.

## NVIDIA notes

- Install a recent proprietary NVIDIA driver (≥ 545 recommended).
- `nvidia-smi` should work in your shell.
- Docker is optional; if you have it installed, Syntax can use
  containerized engines for some models. The desktop app guides you
  through enabling Docker if it isn't present.

## AMD ROCm notes

- Install the ROCm runtime that matches your card and distribution.
- `rocminfo` should work.
- Coverage varies by model — `syntax doctor` reports what's available
  for your specific card.

## CPU-only notes

- Smaller LLMs (≤ ~8B parameters with GGUF support) work well on
  modern CPUs.
- Multimodal models are limited on CPU; route those to a hosted
  provider or to a remote target.

## Wayland and X11

The desktop app supports both Wayland and X11. On distributions
where X11 is the legacy fallback, Syntax detects the active session
and uses the right toolchain automatically.

## Headless servers

The CLI works on headless servers without any of the desktop
toolchain. This is the typical setup for CI runners and self-managed
remote targets — you can install the CLI alone, configure the
Bridge, and serve models without ever bringing up the GUI.

## Where to go next

- [Install on Linux](/docs/getting-started/install-linux)
- [Hardware support](/docs/inference/hardware-support)

---

# macOS

> Platform notes for running Syntax on macOS.

Permalink: https://docs.syntax-ftc.com/docs/platforms/macos

Syntax runs on macOS 12 (Monterey) and later, on both Apple Silicon
and Intel Macs. Apple Silicon is the recommended path.

## What works on macOS

- The full desktop app and CLI.
- Local inference on Apple Silicon via the Apple-native engine —
  fast, low-overhead, no Docker required.
- Local inference on Intel Macs via the CPU path (smaller models
  only).
- All seven supported coding harnesses; `syntax connect` knows the
  macOS-standard paths for each.
- Self-managed remote and managed-remote inference.

## Apple Silicon notes

- The unified-memory architecture means you get GPU-class throughput
  on models that fit in RAM.
- Many vision-language and multimodal models work well on M-series
  chips.
- The Apple-native engine doesn't require Docker, drivers, or any
  manual setup beyond installing Syntax.

## Intel Mac notes

- Local LLM serving on Intel Macs uses the CPU engine. Stick to
  smaller models.
- For larger workloads, route to a hosted provider, run a
  self-managed remote target, or use managed remote.

## Install paths

Standard install:

```bash
curl -fsSL https://www.syntax-ftc.com/install.sh | bash
```

The desktop app appears in `/Applications`; the CLI appears in your
user-local `bin` directory and is added to your `PATH`.

## Where Syntax stores data

Syntax stores its configuration, model weights, and per-user state
under your home directory. The desktop app's **Settings → Storage**
page shows the exact paths and total disk usage.

## Where to go next

- [Install on macOS](/docs/getting-started/install-macos)
- [Hardware support](/docs/inference/hardware-support)

---

# Windows

> Platform notes for running Syntax on Windows.

Permalink: https://docs.syntax-ftc.com/docs/platforms/windows

Syntax runs natively on Windows 10 (21H2) and later, and on
Windows 11. Both the desktop app and the CLI ship as native
Windows binaries; WSL2 is supported for users who prefer a Linux
toolchain.

## What works on Windows

- The full desktop app (native, no WSL required).
- The CLI as a native Windows binary, registered on your `PATH`.
- Local inference on NVIDIA GPUs.
- CPU-only inference on machines without a GPU.
- All seven coding harnesses; `syntax connect` knows the
  Windows-standard config paths for each.
- Self-managed remote and managed-remote inference.

## Native install

PowerShell:

```powershell
iwr -useb https://www.syntax-ftc.com/install.ps1 | iex
```

The installer registers Syntax under your user profile, adds the CLI
to your `PATH`, and creates Start Menu entries.

## NVIDIA on Windows

- Install the latest NVIDIA Studio or Game Ready driver.
- `nvidia-smi` should work in PowerShell.
- The native Windows path covers most LLM serving needs without
  WSL.

## WSL2

If you'd rather use a Linux toolchain, install Syntax inside your
WSL2 distribution exactly as you would on Linux. The desktop app
on Windows and the CLI inside WSL can share the same control plane;
both reach the local Bridge over `localhost`.

For NVIDIA GPU serving inside WSL2, follow NVIDIA's CUDA-on-WSL
guide; the Linux installer detects the GPU at first launch.

## ARM64 Windows

ARM64 Windows is supported. CPU-bound workloads work natively;
GPU acceleration depends on the specific ARM64 hardware (e.g.,
NPU support on Snapdragon X is evolving — `syntax doctor` reports
what it can use on your specific machine).

## Where to go next

- [Install on Windows](/docs/getting-started/install-windows)
- [Hardware support](/docs/inference/hardware-support)