# Syntax Documentation — Full Corpus Source: https://docs.syntax-ftc.com/llms-full.txt — see https://docs.syntax-ftc.com/llms.txt for the shorter index. --- # Frequently Asked Questions > Quick answers to the questions that come up most often when evaluating or starting with Syntax. Permalink: https://docs.syntax-ftc.com/docs/faq ## Is Syntax open source? Syntax is a proprietary product — your Fully Managed, Privately Owned, General-Purpose AI Factory, built on dUX. The Syntax platform itself is free of additional charge; you pay only for the infrastructure it runs on, via dUX. ## What platforms are supported? macOS 12+, Linux (Ubuntu 20.04+ / Debian 11+ / Fedora 38+ / equivalent glibc-based distros), and Windows 10 (21H2)+ / Windows 11. x86_64 and ARM64 are both supported. See [Getting Started → Install](/docs/getting-started/install-macos). ## Do I need a GPU? No. Syntax runs without a GPU; smaller models serve on CPU and larger requests can route to hosted providers. With a GPU (NVIDIA, AMD ROCm, or Apple Silicon) you can run larger models locally. ## Which coding assistants does Syntax support? The Syntax CLI, Codex, Claude Code, OpenCode, and Pi. You can use multiple at once; they share the same active model policy. See [Harnesses → Overview](/docs/harnesses/overview). ## Does Syntax send my code to a server? Only if you choose to. Syntax runs on infrastructure you own — your machine for local serving, your boxes for self-managed remote, your dUX-managed environment for managed remote. Your sessions stay within your boundary unless you've configured a hosted provider. The control plane never sees your session content. ## How does Syntax make money? The Syntax platform itself is free of additional charge. You pay for the infrastructure it runs on, via dUX. ## Can I use my own provider keys? Yes. OpenAI, Anthropic, Google, and other supported providers all work with your own API keys. Syntax routes requests through the Bridge so the keys are stored within your environment and never leave your control. ## How does Syntax compare to running an OpenAI-compatible proxy myself? A custom proxy gets you the routing primitive. Syntax adds: - Hardware-aware multi-engine inference for local serving. - Multi-model party deployments. - Managed remote inference (with dUX). - Per-harness `connect`/`disconnect` so you don't manually edit each tool's config. - Plan Mode, Agent Handoff, Runtime Modes. - Budgets, exposed-endpoint bearers, audit. If a custom proxy gets you 70% of what you need, Syntax gets you 100% without you maintaining it. ## Can I share a deployed model with another tool? Yes. Issue a per-deployment exposed-endpoint bearer (`sk-syntax-…`) from the desktop app. The bearer is scoped to a single deployment and can be revoked at any time. The exposed URL is OpenAI- and Anthropic-compatible so any tool that speaks either ecosystem can use it. ## Does Syntax work offline? Local inference works offline by definition. Hosted-provider routing needs network. Managed remote needs network to dUX. The desktop app itself runs offline. ## Can Syntax run on on-prem, air-gapped infrastructure? Absolutely. Syntax fully supports on-prem and air-gapped deployments. You run a dUX server on your internal network, and it orchestrates whatever compute Syntax needs from the hardware you've made available behind your perimeter. You then choose which OSS models to import into the internal catalog, and from there the internal dUX server manages your private model and image registries — so every user on your private network gets the full Syntax experience without ever leaving the walled garden. ## How does Plan Mode differ from "just asking the agent to plan first"? Plan Mode separates the planning context from the execution context. The executor starts fresh and works from the approved plan, not from the back-and-forth that produced it. That's structurally different from "please plan first then execute" in the same conversation. See [Concepts → Plan Mode](/docs/concepts/plan-mode). ## What's "Agent Handoff" for? When a session fills the context window, Syntax writes a structured snapshot and resumes in a fresh context. This avoids the drift that in-place compaction causes on long sessions. See [Concepts → Agent Handoff](/docs/concepts/agent-handoff). ## How are AI agents supposed to consume these docs? Fetch [`/llms-full.txt`](/llms-full.txt) for the full corpus, or [`/llms.txt`](/llms.txt) plus per-page `/api/mdx/` for targeted retrieval. There's also a [JSON sitemap](/api/sitemap.json). See [Differentiators → AI-agent-friendly](/docs/differentiators/ai-agent-friendly). ## I have more questions The [Glossary](/docs/glossary) covers terminology. For anything that isn't covered, [contact the Syntax team](https://www.syntax-ftc.com/#contact). --- # Glossary > A short, capability-level glossary of Syntax-specific terms. Permalink: https://docs.syntax-ftc.com/docs/glossary This glossary covers Syntax-specific terminology you'll see across the wiki and the product UI. It deliberately stays at capability level — public, user-visible names — so it's safe to share with anyone evaluating Syntax. ## Agent Handoff A structured checkpoint Syntax writes when a session approaches the context-window limit. A fresh agent picks up from the snapshot, so the session continues cleanly instead of in-place compacting. See [Concepts → Agent Handoff](/docs/concepts/agent-handoff). ## Bridge The local OpenAI- and Anthropic-compatible endpoint every Syntax integration talks to. See [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility). ## Catalog The set of models Syntax knows how to deploy. Includes hundreds of open-weight and provider-hosted models across many model purposes. ## Default Sub-Agent The cheaper model in a multi-model party that the main agent delegates routine work to. ## dUX The cloud-GPU orchestrator Syntax integrates with for managed remote inference. Syntax submits deployment intents; dUX handles GPU placement, scaling, drivers, and ingress inside your cloud accounts — you remain the sole admin of the underlying machines. ## Exposed endpoint A per-deployment OpenAI-compatible URL plus a bearer token (`sk-syntax-…`) that lets external clients reach a single deployed model. Issued and revoked from the desktop app. ## Harness A coding assistant that talks to an LLM. The Syntax CLI, Codex, Claude Code, OpenCode, and Pi are the five supported harnesses. See [Harnesses → Overview](/docs/harnesses/overview). ## Local inference Running a model on your own machine — GPU, Apple Silicon, or CPU. ## Main Agent The model your harness primarily talks to in a multi-model party. The main agent can call specialists as tools. ## Managed Remote dUX-backed cloud GPU deployment. ## Modality The kinds of input/output a model handles. Common modalities include text, image, video, and audio. ## Model Purpose A coarse classification of what a model is for. Examples: text generation, embedding, OCR, image processing, video processing, image generation, video generation, segmentation, TTS, audio generation, mesh recovery, UI grounding, audio transcription, speech-to-speech, time-series forecasting. ## Party A deployment with a Main Agent, a Default Sub-Agent, and up to six Specialists composed together. See [Concepts → Party Builder](/docs/concepts/party-builder). ## Plan Mode A two-phase agent workflow: the agent proposes a structured plan, you accept, and a fresh execution context carries it out. See [Concepts → Plan Mode](/docs/concepts/plan-mode). ## Preset A schema-versioned ready-to-deploy party definition. Lets a team share a common multi-model configuration in one click. ## Remote Self-Hosted A deployment target where Syntax controls a remote box you've provided (your own server, your own GPU, your own SSH). ## Runtime Mode The three-state cycle (Default / AutoEdit / Bypass) that controls how cautious Syntax is about running tool calls without asking. See [Concepts → Runtime Modes](/docs/concepts/runtime-modes). The three states are: - **Default** — asks before running anything that touches your filesystem, runs a shell command, or makes a network call. Reads are usually unattended. - **AutoEdit** — auto-approves common edits and routine commands; only asks for genuinely risky operations. - **Bypass** — approves everything without asking. Entering Bypass requires an explicit Y/N confirmation; it is not the default mode you cycle into accidentally. ## Specialist A model in a party, beyond the main agent and the sub-agent, that provides a specific capability and is exposed to the main agent as a tool. ## Strategy On multi-host deployments, the choice between **Performance** (one model per host) and **Economy** (pack onto the fewest hosts). ## Tier (deployment) For managed remote, **Latency** vs **Throughput** — two different optimization profiles dUX uses when placing your deployment. ## syntax connect / syntax disconnect CLI commands (and equivalent UI buttons) that wire a coding assistant to the local Bridge — and unwire it cleanly. --- # Welcome to Syntax > Syntax is your Fully Managed, Privately Owned, General-Purpose AI Factory. Permalink: https://docs.syntax-ftc.com/docs Syntax is your Fully Managed, Privately Owned, General-Purpose AI Factory. Syntax serves as a universal backend for AI that allows you to immediately deploy both proprietary and open-source models. Through Syntax, you can automatically deploy open models to your existing cloud instances, or let Syntax auto-provision the necessary infrastructure. This auto-provisioning takes place on a unified, secure, and private managed account fully owned by you, spanning dozens of public cloud providers via dUX, our fully-featured compute resources orchestrator. Take full control over which providers and regions to use. You can implement custom per-user budget controls and IAM rules, integrate your own secrets-management tool to ensure privacy, and monitor your deployments and costs across multiple providers from a single control plane. The platform supports a rapidly expanding catalog, includes frontier-class and lightweight LLMs, alongside models for Visual Understanding, Image/Video/Audio Generation, Search & Embeddings, Time-Series, 3D, Safety & Guardrails, and much more. Syntax features its own Codex-based `syntax-cli` coding harness, which supports a unique "Models Party" configuration. Additionally, it seamlessly connects and deploys OSS models for use with third-party environments like OpenAI's Codex, Claude Code, Pi, and OpenCode. Choose any model from the catalog, including cutting-edge OSS models, deploy it with a single click, connect your preferred coding harness via a simple CLI command, and start building. You can also sign in with your OpenAI Plus/Pro subscription to deploy OpenAI models alongside your OSS stack, ensuring optimal token utilization across simple, everyday, and highly complex tasks. ## Security and Isolation Neither Syntax nor dUX will ever access your machines, data, logs, files, or code. Furthermore, by integrating dUX with your private secrets manager, you establish a technical guarantee that we cannot access any of your resources under any circumstances. Additionally, enterprise customers are automatically deployed as fully isolated tenants, ensuring complete infrastructure separation from all other dUX or Syntax clients. ## Documentation Overview - **Introduction** — What Syntax is and how it works. - **Getting Started** — Installation guides for macOS, Linux, and Windows; first launch instructions; and connecting your first harness. - **Harnesses** — Supported coding assistants and how to point them toward Syntax. - **Inference** — Local (GPU/CPU/Apple Silicon), remote self-hosted, and managed remote inference via dUX. - **Models** — Supported model categories, modalities, reasoning capabilities, tool use, and licensing. - **Concepts** — Party Builder, Plan Mode, Agent Handoff, Runtime Modes, Memory, Exposed Endpoints, and Observability. - **Syntax × dUX** — How Syntax integrates with dUX for managed remote inference. - **CLI Reference** — The two binaries — `syntax` (the application command) and `syntax-cli` (the bundled coding harness) — and their user-visible flags. - **Differentiators** — Why engineering teams choose Syntax. ## For AI agents Every page in this wiki is also available as raw Markdown: - The full corpus is available at [`/llms-full.txt`](/llms-full.txt). - A short index is available at [`/llms.txt`](/llms.txt). - Each page exposes a Markdown source via `/api/mdx/`. - A JSON sitemap of all pages is available at [`/api/sitemap.json`](/api/sitemap.json). --- # User-visible flags reference > The flags you'll actually use day-to-day across `syntax` and `syntax-cli`. Permalink: https://docs.syntax-ftc.com/docs/cli/flags This page lists the user-visible flags you'll see across the two Syntax binaries: - **`syntax`** — the application command (`connect`, `disconnect`, `deploy`, `doctor`, `models`, `sessions`, `memory`). - **`syntax-cli`** — the bundled coding harness (interactive TUI and the `exec` subcommand for headless runs). For the authoritative flag list, run ` --help` directly — that output reflects the actual flags shipped in your installed version. Placeholders use `ALL_CAPS` to avoid conflicts with command-line angle brackets in tables — e.g., `--model ID` means "supply a model ID where `ID` is shown". ## Global flags These work on both `syntax` and `syntax-cli`: | Flag | Meaning | |---|---| | `--version` | Print the running Syntax version. | | `--help`, `-h` | Subcommand-specific help. | | `--quiet` | Suppress non-essential output. | | `--verbose` | Extra output (without enabling tracing). | | `--no-color` | Disable color in terminal output. | ## `syntax-cli` (interactive harness) | Flag | Meaning | |---|---| | `--resume` | Resume a recent session. | | `--model ID` | Override the default model for this session. | | `--mode MODE` | Start in a specific Runtime Mode (`default`, `autoedit`, or `bypass`). Bypass still requires interactive confirmation. | | `--plan` | Start in Plan Mode. | | `--no-color` | Disable color. | ## `syntax-cli exec` (headless) | Flag | Meaning | |---|---| | `--input PATH` | Read the task from a file instead of the command line. | | `--output PATH` | Write structured output to a file. | | `--model ID` | Override the default model. | | `--max-turns N` | Cap the number of agent turns. | | `--policy PATH` | Load an approval policy from a file. | | `--json` | Emit structured JSON output (for piping). | ## `syntax connect` | Flag | Meaning | |---|---| | `list` | List currently connected harnesses. | | `--dry-run` | Print what would change without modifying anything. | ## `syntax deploy` | Flag | Meaning | |---|---| | `--target TARGET` | Deployment target: `local`, `self-managed-remote`, or `managed-remote`. | | `--tier TIER` | Deployment tier: `performance` or `cost-optimized`. | | `--expose-private` | Issue a private exposed bearer. | | `--expose-public` | Issue a public exposed bearer. | | `--profile NAME` | Use a saved party profile. | ## Where to go next - [Syntax Coding Harness](/docs/cli/syntax-cli) - [`syntax connect`](/docs/cli/syntax-connect) - [CLI overview](/docs/cli/overview) --- # CLI overview > The two Syntax binaries — `syntax` (the application command) and `syntax-cli` (the bundled coding harness) — and what each one is for. Permalink: https://docs.syntax-ftc.com/docs/cli/overview Syntax ships with two distinct binaries: - **`syntax`** is the umbrella command for the Syntax application itself — connecting and disconnecting harnesses, deploying models, managing the catalog, inspecting sessions, running diagnostics. - **`syntax-cli`** is the bundled coding harness — the interactive TUI agent. It's a sibling of Codex, Claude Code, OpenCode, and Pi from a conceptual standpoint, but it ships with Syntax and is always available without a `syntax connect` step. Anything you can do in the desktop app for application-level operations, you can do via `syntax`. The interactive coding experience lives in `syntax-cli` (and in the desktop app, which shares the same agent core). ## `syntax` commands at a glance | Command | Purpose | |---|---| | `syntax connect ` | Wire a coding assistant to the Bridge. | | `syntax disconnect ` | Restore a coding assistant's original config. | | `syntax doctor` | Self-check: hardware, deps, network, deployments. | | `syntax deploy` | Deploy a model or party from the CLI. | | `syntax models` | Browse / search the catalog. | | `syntax sessions` | List / inspect / resume past sessions. | | `syntax memory` | Inspect or edit Layer 1 / Layer 2 memory. | | `syntax --version` | Print the running Syntax version. | | `syntax --help` | Top-level help. | Subcommand-specific help is available on every command: ```bash syntax connect --help syntax deploy --help ``` ## `syntax-cli` commands at a glance | Command | Purpose | |---|---| | `syntax-cli` | Start an interactive coding session (TUI). | | `syntax-cli --resume` | Resume a recent session. | | `syntax-cli exec` | Run a one-shot agent task headlessly (CI / scripting). | | `syntax-cli --help` | Harness-specific help. | ## When to use the CLI - **Interactive coding in a terminal.** Run `syntax-cli` to start a TUI session. - **Scripted automation.** `syntax-cli exec` runs a single task headlessly with structured I/O — fits naturally into CI pipelines, cron jobs, and pre-commit hooks. - **Setup / teardown.** `syntax connect`, `syntax disconnect`, `syntax doctor`, `syntax deploy` are one-liners that don't need a GUI. - **Remote shells.** When you're SSH'd into a box and just need a coding agent there. ## When the desktop app is better - Browsing the catalog visually and composing parties. - Watching multi-deployment fleets at a glance. - Issuing / revoking exposed-endpoint bearers (the bearer is shown exactly once, and copying from a popup is more reliable than copying from a terminal). - Configuring hardware / providers / managed remote targets. ## Where to go next - [`syntax-cli`](/docs/cli/syntax-cli) — the interactive coding harness. - [`syntax connect`](/docs/cli/syntax-connect) — harness wiring. - [Flags](/docs/cli/flags) — the user-visible flags reference. --- # Syntax Coding Harness > The Syntax coding harness — TUI sessions with Plan Mode, Runtime Modes, and the full agent experience, started via `syntax-cli`. Permalink: https://docs.syntax-ftc.com/docs/cli/syntax-cli The Syntax coding harness is launched with `syntax-cli`. It's a distinct binary from the top-level `syntax` command (which is the umbrella for the Syntax application itself — connect, deploy, doctor, models, sessions, memory, and so on). `syntax-cli` ships bundled with Syntax; no separate install or `syntax connect` step is required. ## Starting a session ```bash syntax-cli ``` Starts a fresh interactive session in the current working directory. The agent sees the directory's contents (subject to your Runtime Mode and any ignore patterns). ## Resuming a session ```bash syntax-cli --resume ``` Lists recent sessions and lets you pick one to resume. Resumed sessions inherit their previous state, including any Plan Mode plan, the working tree at handoff, and Layer-2 memory. ## Working with files The harness has the full set of built-in tools — file editing, shell execution, web search, MCP integrations, the Skills framework, and any specialists deployed in the active party. Tool calls are gated by your active Runtime Mode (see [Concepts → Runtime Modes](/docs/concepts/runtime-modes)). ## Keyboard A few terminal-specific keybindings: | Key | Effect | |---|---| | `Ctrl+M` | Cycle Runtime Mode (Default → AutoEdit → Bypass → Default). | | `Esc` | Cancel the current turn — works pre-turn, mid-stream, and over popups. | | `Ctrl+C` | Soft-quit the session. | | `Ctrl+D` | Hard-quit. | ## Plan Mode in the harness Plan Mode is a first-class harness experience. Toggle Plan Mode for the current session and the agent enters the plan-then-execute split described in [Concepts → Plan Mode](/docs/concepts/plan-mode). Approved plans persist to disk so you can re-execute them later. ## Headless / scripted: `syntax-cli exec` For non-interactive use: ```bash syntax-cli exec "fix the failing test in tests/foo.py and add a regression test" ``` `exec` runs a single task without a TUI and exits. Approvals are governed by the active policy rather than per-call confirmation; output is structured for piping into other tools. ## Where to go next - [`syntax connect`](/docs/cli/syntax-connect) - [Flags](/docs/cli/flags) - [Harnesses → Syntax CLI](/docs/harnesses/syntax-cli) --- # Syntax connect > Wire a coding assistant to the local Bridge. Reversible. Permalink: https://docs.syntax-ftc.com/docs/cli/syntax-connect `syntax connect ` edits the named harness's own configuration to point at the local Bridge, and records the change so it can be undone. ## Usage ```bash syntax connect ``` Where `` is one of: - `codex` - `claude-code` - `opencode` - `pi` The Syntax CLI is bundled with Syntax and doesn't take a `connect` step — it's available the moment Syntax is installed. ## What happens 1. Detects whether the named harness is installed. 2. Locates its configuration file in the standard location for your OS. 3. Backs up the configuration in a Syntax-managed ledger. 4. Edits the configuration to point at the local Bridge. 5. Applies any harness-specific normalizations. 6. Records the change. ## Disconnecting ```bash syntax disconnect ``` Restores the harness's original configuration from the ledger and removes the ledger entry. If the harness has been removed since connection, `disconnect` cleans up gracefully. ## Listing connections ```bash syntax connect list ``` Shows every harness that's currently connected. ## Multiple connections You can connect multiple harnesses simultaneously. They share the same Bridge, the same active model policy, and the same approvals. ## Detection failures If the named harness isn't installed, `syntax connect` prints the upstream install instructions and exits without making changes. No ledger entry is created, so a subsequent install + connect works cleanly. ## Where to go next - [Harnesses overview](/docs/harnesses/overview) - [Connecting a harness](/docs/getting-started/connecting-a-harness) --- # Agent Handoff > When the context window fills up, Syntax writes a structured handoff and starts fresh — instead of in-place compaction that loses the thread. Permalink: https://docs.syntax-ftc.com/docs/concepts/agent-handoff Long sessions inevitably fill the context window. Most agents handle this with **compaction**: throwing away the older parts of the conversation, sometimes summarizing them, sometimes just truncating. The result is that the agent loses track of the original goal, forgets decisions you discussed early on, and starts to drift. Syntax takes a different approach. When the context window approaches its limit (around 80–90% utilization), Syntax does an **Agent Handoff**: 1. The current agent writes a structured, schema-conformant snapshot of the work so far — the goal, the decisions, the files touched, the open questions, the next steps. 2. The snapshot is saved to durable storage in your home directory. 3. A fresh agent starts with an empty context primed only by a "resume" instruction that points at the snapshot. The new agent reads the snapshot, picks up where the previous one left off, and keeps going. The user-visible effect is "the conversation feels like it just keeps going indefinitely". The technical effect is "every turn always has a clean context window". ## Why this is better than compaction Compaction is lossy by construction. Either you summarize (and the summary is wrong in subtle ways) or you truncate (and the agent forgets what mattered). Handoff is an explicit checkpoint: the schema captures exactly what's needed to resume, and the new agent doesn't carry any of the old turn-by-turn churn. ## What's in a handoff The schema is intentionally lean. Roughly: - The original goal and any clarifying answers. - A summary of what's been done so far, organized by step. - The list of files that have been changed, with intent (e.g., "edited but not yet tested"). - Open questions that need answers before the next step. - The next step. ## What happens to the old conversation It stays in your session history. If you want to scroll back, search it, or fork from a particular point, it's there. The handoff doesn't delete anything — it's a resume primitive, not an archive primitive. ## Compaction is still the fallback The classic in-place compaction (`/compact` in supported harnesses) is still available if you want it for a specific session. Handoff is the default at the long-context threshold, but you can always force the older behavior. ## Related concepts - [Plan Mode](/docs/concepts/plan-mode) — the front-end version of the same idea: separate planning context from execution context. - [Runtime Modes](/docs/concepts/runtime-modes) — how individual tool calls are gated within a turn, regardless of context length. --- # Exposed endpoints > Per-deployment OpenAI-compatible bearer tokens you can issue and revoke from the desktop app. Permalink: https://docs.syntax-ftc.com/docs/concepts/exposed-endpoints By default, the local Bridge is reachable only from `localhost`. That's the safe default — your harness on the same machine can reach it; nothing else can. But sometimes you want to share a single deployed model with another tool or a teammate. That's what **exposed endpoints** are for. ## What an exposed endpoint is An exposed endpoint is a deployment-scoped, OpenAI-compatible URL plus a bearer token. The bearer: - Starts with `sk-syntax-` followed by a random suffix. - Is shown to you exactly once when you issue it. After that, it's not retrievable from Syntax — only the masked form is shown. - Is scoped to a single deployment. Other models on the same Syntax install are not reachable through this bearer. - Can be revoked at any time. Revocation is immediate. ## Two flavors of exposure When you deploy a model, you can pick: - **Expose private** — the endpoint is reachable from your other Syntax tools (e.g., a teammate's machine on the same internal network) but not from the public internet. - **Expose public** — the endpoint is reachable from anywhere with the bearer. Use this when you want to share with a non-Syntax tool that can't reach your private network. You can pick either, both, or neither. Most one-off sharing flows just use a private exposure; public exposure is for cases where you genuinely need a publicly reachable URL. ## What an exposed bearer can do A bearer can call the deployment's OpenAI- and Anthropic-compatible inference surface (chat, messages, and model listing for the scoped model — whichever apply to the model's modality). It **cannot**: - Call any other deployment. - Issue or revoke other bearers. - Modify deployments. - Reach Syntax's settings or control plane. ## Issuing and revoking From the desktop app's **Active Deployments** view: - Click **Expose** on a deployment to issue a bearer. The bearer is shown once with a "Copy" button. After you close the modal, you'll only see the masked form. - Click **Revoke** to invalidate the bearer immediately. New requests fail; in-flight requests complete normally. ## When this matters - **Sharing with a non-Syntax tool.** Any tool that speaks an OpenAI-compatible API can use the exposed bearer. - **Multi-machine workflows.** Run Syntax on one machine, run a bot or a backend on another, and let the backend reach a deployed model through a private exposure. - **Stable URLs for a project.** Issue one bearer for a project, share it with the project's team, revoke when the project ends. ## Where to go next - [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility) — the Bridge behind the URL. - [Inference → Managed remote](/docs/inference/managed-remote) — managed deployments support both private and public exposures. --- # Memory > Two-layer memory — always-on file-based memory plus an opt-in retrieval pipeline for session-spanning recall. Permalink: https://docs.syntax-ftc.com/docs/concepts/memory Long-running coding work needs memory that survives across sessions. Syntax's memory system has two layers, designed so the always-on layer is simple and predictable while the optional second layer adds sophisticated retrieval when you want it. ## Layer 1 — always on A canonical memory file in your project (and a per-user one) is loaded into every agent context automatically. Layer 1 memory is: - **File-based.** It's a normal Markdown file you can read, edit, and diff like any other file. - **Opt-in by content.** You write what you want remembered; the agent doesn't auto-write here. - **Always loaded.** Every agent turn sees it, so guidance you write there is enforced consistently. This is the right layer for "always-true" facts about your project, your preferences, or your conventions — things you want every agent turn to know without having to repeat. ## Layer 2 — opt-in retrieval Layer 2 is a richer system that lets the agent record discrete memory entries and retrieve them on demand: - **Hybrid retrieval.** Combines lexical and semantic search over a per-user memory store. - **Per-turn auto-retrieval.** When enabled, every agent turn automatically pulls in memories relevant to the current input before producing a response. - **`memory_search` tool.** The agent can also explicitly search memory mid-turn. - **Schema-stable storage.** The on-disk shape is durable, so memory built up over many sessions stays intact across upgrades. Layer 2 is the right layer for "things the agent learned" — facts about your codebase, decisions made in earlier sessions, lessons learned from failed approaches. ## When to use which - **Layer 1** for: explicit guidance, conventions, "always-do" / "never-do" rules, project context that doesn't change much. - **Layer 2** for: incidental learnings, debugging notes, history of decisions, anything you'd otherwise forget. You can use both at once. Most teams start with Layer 1 alone and adopt Layer 2 once they have multi-session workflows where recall becomes valuable. ## Where memory lives Memory storage lives in your home directory under predictable paths that you can inspect and back up. Layer 1 is plain Markdown; Layer 2 is a small structured store you don't have to read by hand. ## Memory and Agent Handoff When a session crosses the long-context threshold and Syntax does an [Agent Handoff](/docs/concepts/agent-handoff), the memory system is unaffected. Layer 1 is loaded fresh for the new agent; Layer 2's retrieval works the same way. Memory is the right answer to "this information should survive across handoffs and across sessions"; handoff is the right answer to "this conversation should keep going indefinitely". ## Where to go next - [Concepts → Agent Handoff](/docs/concepts/agent-handoff) - [Concepts → Plan Mode](/docs/concepts/plan-mode) --- # Observability > Metrics, traces, crash dumps, and request logging for everything Syntax runs — controllable from the UI. Permalink: https://docs.syntax-ftc.com/docs/concepts/observability Syntax exposes a small but complete observability surface so you can see what's happening at runtime without digging through log files. The same controls work for local inference, self-hosted remote, and managed remote deployments. ## What's instrumented For every deployed model and every request through the Bridge, Syntax collects: - **Metrics.** Tokens in / out, latency, request count, error rate, deployment health. - **Traces.** Per-request traces showing each step of the routing pipeline. - **Crash dumps.** When an inference engine or a tool call crashes, Syntax keeps a dump on disk so you can investigate. - **Request logs.** Optional structured logs of every chat request. Each is independently configurable. ## Where you control it The desktop app's **Observability** page lets you: - **Toggle metrics, tracing, crash dumps, and request logging** per deployment and globally. - Pick a tracing destination — local file, your own OpenTelemetry collector, or none. - Pick a metrics destination — local Prometheus-compatible endpoint, your own Prometheus scraper, or none. - Set request-log retention. The same controls work for local, self-hosted remote, and dUX-managed remote deployments. Managed-remote observability data flows back through dUX into Syntax so you don't need a separate observability stack on the cloud side. ## What's surfaced in the UI The desktop app also surfaces a small set of high-leverage signals inline so you don't have to look at metrics dashboards: - **Active Deployments** shows live token / request counters per deployment. - **Sessions** shows per-session usage and budget consumption. - **Status** badges indicate when a deployment is degraded. - A consolidated **Issues** panel groups warnings ("a deployment is out of memory", "a remote target stopped responding") so they're visible without you opening a log file. ## Privacy Observability is local by default. Metrics, traces, and request logs all stay on your machine unless you explicitly point them at an external collector. Org admins can require that observability data flows through the org's collector for audit purposes. ## When to enable each - **Metrics**: always. Cheap, useful for capacity planning. - **Tracing**: when you're debugging a routing issue or a slow request. - **Crash dumps**: leave on; they don't cost anything until something crashes. - **Request logging**: enable when you need to debug a specific flow or when org policy requires it. Otherwise leave off — it generates the most data. ## Where to go next - [Inference → Overview](/docs/inference/overview) — what each inference target reports. --- # Party Builder & Specialists > Compose a strong main agent, a cheaper sub-agent, and up to six specialists into a single deployment. Permalink: https://docs.syntax-ftc.com/docs/concepts/party-builder Real coding workflows rarely fit a single model. You want a strong main agent for hard problems, a cheap and fast sub-agent for everything else, and sometimes one or more specialists for things like image understanding, OCR, image generation, time-series forecasting, or other non-text tasks. The **Party Builder** is the UI and runtime that lets you compose those together as a single deployment. ## The shape of a party A party has up to eight slots: | Role | Count | Purpose | |---|---|---| | **Main Agent** | 1 (required) | The model the harness primarily talks to. | | **Default Sub-Agent** | 1 (required — can re-use main agent's model) | The cheaper model the main agent delegates routine work to. | | **Specialist** | up to 6 (optional) | A model with a specific capability, exposed as a tool the main agent can call. | Specialists can be any model in the catalog. Each gets an optional custom instruction that the main agent sees when deciding whether to call it. ## How specialists are called When you deploy a party, every specialist is registered with the main agent as a tool, along with a structured description the main agent can use to decide when to invoke it. The agent calls the tool; the call is forwarded to the specialist; the specialist's response is folded back into the conversation. ## Presets The Party Builder ships with a **Presets** tab — schema-versioned ready-to-deploy party definitions you can pick instead of composing one yourself. Presets are useful for common workflows ("a coding party", "a vision-and-coding party", "a document-processing party") and for sharing standard configurations across a team. ## Capability scoring and plan generation Before you deploy, the Party Builder shows: - **Coverage**: which capabilities the chosen models cover (text, reasoning, image understanding, image generation, audio, etc.) and where there are gaps. - **Strength**: a per-model strength bar so you can see which model is carrying which capability. - **A deployment plan**: the expected hardware footprint of the party on local GPU, on managed remote, or on self-hosted remote. For local and self-hosted deployments, the plan is computed by the inference plane's autotuning logic, which knows how each model fits on your hardware and where to put it relative to the others. For managed remote, the plan is sent to dUX, which returns the placement. ## Where to deploy A party deploys to any of the same targets as a single model: - **Local** — one or more models on your own machine. - **Self-Managed Remote** — your own SSH-reachable GPU box(es). - **Managed Remote** — dUX-backed cloud. The deployment process is the same in each case; only the underlying hardware changes. ## When to build a custom party vs use a preset - **Use a preset** if your workflow lines up with a common template. - **Build a custom party** if you have a specific main model you trust, cheaper specialists you want to lean on for routine work, and capability requirements that aren't covered by presets. ## Where this connects - The main and sub-agent slots are two of the three reasons for Multi-model parties — see [Differentiators → Multi-model parties](/docs/differentiators/multi-model-parties). - The deployment targets are described in [Inference → Overview](/docs/inference/overview). - The capability scoring system reuses the [Models → Purposes](/docs/models/purposes) taxonomy. --- # Plan Mode > Structured planning before execution — the agent proposes, you accept, then a fresh context fork actually runs the work. Permalink: https://docs.syntax-ftc.com/docs/concepts/plan-mode Plan Mode is a deliberate pause between "the agent has heard your request" and "the agent starts changing things". It splits work into two phases: 1. **Plan**: the agent proposes a structured plan you can read, refine, and approve. 2. **Execute**: a fresh context picks up the approved plan and carries it out, with the plan as durable state. The split is intentional. Long-running execution agents tend to accumulate scratch context that has nothing to do with the actual task; when something goes wrong on step 12, none of that scratch helps. Plan Mode keeps the planning context separate from the execution context so the executor starts with exactly the relevant input and nothing else. ## What you see in Plan Mode In Plan Mode, the agent: - Asks clarifying questions if your request is ambiguous, and saves the answers. - Reads the relevant code, docs, or external references. - Produces a plan with: goal, the files it intends to change, the sequence of steps, and verification criteria. - Stops. It does not start editing. You can: - Accept the plan as-is. - Send corrections (e.g., "split step 3 into two", "skip step 5", "add a rollback note for step 7"). - Reject and re-plan. ## What happens after acceptance When you accept a plan, Syntax forks a fresh execution context primed with the plan as the canonical input. That executor doesn't see the back- and-forth from the planning phase — it sees only the final approved plan plus whatever runtime context it builds itself. This is one reason plans tend to execute cleanly even when the planning conversation was messy. ## Why this matters - **Reviewability.** A plan is something you can scroll, share, paste in a PR description, or hand off to a teammate. A 10,000-token chat log isn't. - **Determinism.** Two executors given the same plan should produce similar outputs. If yours don't, that's a signal worth investigating. - **Failure containment.** If the executor goes off the rails on step 9, it's much easier to resume from step 9 (with the original plan) than to reconstruct the intent from a tangled conversation. ## When Plan Mode isn't the right answer Trivial one-shot tasks ("rename this variable", "format this file") don't need a plan. Plan Mode is most valuable for multi-file changes, investigations that span several systems, or anything you'd want a written reference for after the fact. ## How to invoke In supported harnesses, Plan Mode is a setting or a slash-command. The desktop app exposes it as a toggle on the session view. Syntax CLI accepts it as a flag. ## Related concepts - [Agent Handoff](/docs/concepts/agent-handoff) — what happens when a long execution context fills up. - [Runtime Modes](/docs/concepts/runtime-modes) — the gating layer that decides which tool calls run unattended. --- # Runtime Modes > Default, AutoEdit, and Bypass — the three-state cycle that gates which tool calls run unattended. Permalink: https://docs.syntax-ftc.com/docs/concepts/runtime-modes Runtime Modes control how cautious Syntax is about running tool calls without asking you first. The cycle has three states: | Mode | Behavior | |---|---| | **Default** | Asks before running anything that touches your filesystem, runs a shell command, or calls out to the network. Reads are usually unattended. | | **AutoEdit** | Auto-approves common edits and routine commands; asks only for genuinely risky operations (e.g., destructive shell, network egress to non-allowlisted hosts). | | **Bypass** | Approves everything without asking. **Requires an explicit Y/N confirmation to enter.** | You cycle through the three states with `Ctrl+M` in the TUI (or the equivalent toggle in the desktop app). The current mode shows in the status bar as a small badge. ## Why three modes Most coding sessions sit comfortably in **AutoEdit**. The cost of approving every diff hunk is real, and the risk of an unwanted edit is small for most operations. AutoEdit removes the friction without giving up the gates that matter. **Default** is the right place to be when you're working on something unfamiliar, when you're sharing your screen, or when you want a strict human-in-the-loop posture. **Bypass** is the right place to be when you've validated the agent's plan, you trust the approvals fence to be set elsewhere (e.g., a sandbox), and you want maximum velocity. The explicit confirmation gate exists because Bypass is genuinely riskier — it is *not* the default posture and it is not the default mode you cycle into accidentally. ## What's gated regardless of mode A small number of operations are always gated through a deterministic classifier rather than the current Runtime Mode. These include operations that delete data, force-push, modify shared infrastructure, or upload content to third-party web tools. The Runtime Mode is a *posture*, not an override; the deterministic classifier on truly risky operations always runs. ## Cancellation Independently of the mode, **`Esc`** always interrupts the current turn. It works: - Pre-turn (between submit and the agent actually starting). - Mid-turn (during streaming or tool execution, even when the working spinner is hidden). - With a popup active (popup dismisses first; second `Esc` cancels the turn). ## Related concepts - [Plan Mode](/docs/concepts/plan-mode) — runs the planning phase before any tool call would have to be approved. - [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility) — the Bridge that receives the requests; budgets and approvals live here. --- # Specialist Models > Specialists are non-main models that a multi-model party exposes as tools to the main agent. Permalink: https://docs.syntax-ftc.com/docs/concepts/specialist-models A **Specialist** is any model in a multi-model party other than the Main Agent and Default Sub-Agent. Specialists exist because real coding work occasionally needs something the main model is bad at — image understanding, OCR, segmentation, image generation, TTS, audio transcription, time-series forecasting — and dragging the main model through those tasks is wasteful. ## How specialists become tools When a party deploys, every specialist is registered with the Main Agent as a tool. The tool description comes from the catalog entry's structured description, which tells the Main Agent what the specialist is good at and when to call it. The Main Agent then calls the specialist exactly the way it would call any other tool. The Bridge intercepts the call, forwards it to the specialist (with whatever payload is appropriate for the specialist's modality), and folds the response back into the conversation. ## Custom instructions per specialist When you compose a party, every specialist gets an optional instruction string. The instruction tells the Main Agent how to use that specialist — for example, "use this specialist for OCR on scanned PDFs, not for OCR on screenshots". Custom instructions are the cheapest way to make a generic capability behave well in your specific workflow. ## Cost & latency Specialists are usually smaller and cheaper than the Main Agent. Routing routine work to a specialist instead of the Main Agent saves both tokens and time. The Party Builder's plan view shows the expected cost shape so you can predict the savings before deploying. ## Specialists that aren't LLMs Many specialists are not LLMs at all — image generators, segmenters, TTS, audio generators, mesh recovery, UI grounding, time-series forecasters. Each surfaces as a purpose-specific tool matched to its modality, so the Main Agent invokes the right tool for the right kind of work. ## Where to go next - [Concepts → Party Builder](/docs/concepts/party-builder) - [Models → Purposes](/docs/models/purposes) - [Differentiators → Multi-model parties](/docs/differentiators/multi-model-parties) --- # Agent Handoff (vs compaction) > Long sessions don't degrade — at the long-context threshold, Syntax does a structured handoff to a fresh context instead of in-place compaction. Permalink: https://docs.syntax-ftc.com/docs/differentiators/agent-handoff Almost every other agent stack handles "context window fills up" with **compaction**: throwing away or summarizing the older parts of the conversation. The result is the same in every case: the agent loses the thread, forgets early decisions, and starts to drift. Syntax does **Agent Handoff** instead. When the context approaches its limit, the current agent writes a structured snapshot of the work so far, persists it, and a fresh agent picks it up. ## Why this is structurally better Compaction is lossy by construction: - **Summarize**: the summary is wrong in subtle ways. The model is optimizing "shorter" without knowing what mattered. - **Truncate**: the agent forgets exactly the thing it needs. - **Both**: the new context is a mix of half-remembered earlier turns and full recent turns. The agent's behavior gets weird. Handoff is an explicit checkpoint with a defined schema. The new agent gets: - The original goal. - The decisions made so far. - The files touched and the intent of each change. - The open questions. - The next step. …and nothing else. No turn-by-turn churn. No half-remembered context. ## What you see The user-visible effect: "the conversation feels like it just keeps going indefinitely". The technical effect: "every turn always has a clean context window". For long-running agentic workflows, this is the difference between an agent that finishes its task and an agent that spirals after turn 30. ## When to use compaction instead Plain compaction is still available as a fallback if you want it for a specific session. Handoff is the default at the long-context threshold, but you can always force the simpler behavior. ## Where this lives - [Concepts → Agent Handoff](/docs/concepts/agent-handoff) — the capability-level walkthrough. - [Concepts → Plan Mode](/docs/concepts/plan-mode) — the front-end version of the same idea (separate planning context from execution context). --- # AI-agent-friendly docs > This wiki ships as a website AND as a Markdown corpus AI agents can ingest in a single fetch. Permalink: https://docs.syntax-ftc.com/docs/differentiators/ai-agent-friendly These docs are built to be useful to humans **and** to AI agents. Most documentation sites assume only humans will read them. Syntax's docs expose machine-readable surfaces alongside the rendered HTML so the agents in your codebase can reason about Syntax's capabilities the same way a developer can. ## What's exposed | Surface | Path | Format | |---|---|---| | LLM index | [`/llms.txt`](/llms.txt) | Plain text (Markdown links) | | Full corpus | [`/llms-full.txt`](/llms-full.txt) | Plain text (concatenated Markdown) | | Per-page raw | `/api/mdx/` | text/markdown | | JSON sitemap | [`/api/sitemap.json`](/api/sitemap.json) | application/json | | XML sitemap | `/sitemap.xml` | XML | | `robots.txt` | `/robots.txt` | text | ## How agents typically use it - **One-shot ingestion**: fetch [`/llms-full.txt`](/llms-full.txt) once and the agent has the whole wiki in its context. - **Targeted retrieval**: fetch [`/llms.txt`](/llms.txt) for a short index, pick the relevant page, then fetch its raw Markdown via `/api/mdx/`. - **Programmatic discovery**: hit [`/api/sitemap.json`](/api/sitemap.json) to enumerate pages with metadata (title, description, tags, lastModified) and decide which to pull. ## Why it's a differentiator For AI-coding teams evaluating Syntax, an agent in your codebase can read these docs in one fetch and answer "should we use Syntax?" with real grounded answers — not training-cutoff guesses. Once Syntax is in use, the same surface lets your agents ask runtime questions ("how do exposed endpoints work?", "which harnesses are supported?") and get authoritative answers from a single canonical source. ## What's *not* exposed These docs are public and capability-level. They describe **what** Syntax does and **why** it's useful to you — not the internals of how it does any of it. Internal protocols, source-level structure, component boundaries, and implementation choices are deliberately not in this corpus. If you need integration help that requires those details, contact the Syntax team. ## Where to start If you're an AI agent reading these docs, the recommended starting points are: - [`/llms.txt`](/llms.txt) — short index. - [`/docs/introduction/what-is-syntax`](/docs/introduction/what-is-syntax) — the human-readable overview. - [`/api/sitemap.json`](/api/sitemap.json) — programmatic page list. --- # First-class inter-compatibility > The harnesses you already use and the models you actually want to run — both sides keep their full capabilities, and nothing in the middle is degraded. Permalink: https://docs.syntax-ftc.com/docs/differentiators/first-class-inter-compatibility Syntax sits between two ecosystems: the coding harnesses you already use and the wider catalog of models you might run behind them. Both sides keep their full capabilities. Harnesses stay unmodified. Every OSS model in the catalog keeps every feature its authors shipped it with. Nothing in the middle is degraded. ## The harness you already use, unmodified Every supported coding assistant — the Syntax CLI, Codex, Claude Code, OpenCode, and Pi — works with Syntax without modification. None of them are forked. None of them have a Syntax plugin. The integration is as simple as it gets: 1. The harness's existing configuration points it at an LLM endpoint. 2. `syntax connect ` edits that configuration to point at `localhost:` (the Bridge) instead. 3. The harness sends OpenAI- or Anthropic-compatible requests to the Bridge. 4. The Bridge resolves the model, applies your policy, picks a backend, and streams the response back in the wire format the harness asked for. The harness has no idea Syntax is in the middle. ## Reasoning, tool use, and modalities — on by default For every OSS model in the catalog that supports reasoning, tool use, or additional modalities, Syntax deploys those capabilities enabled by default. You don't toggle on tool use; you don't enable the reasoning channel separately; you don't wire up a different endpoint for vision or audio. The deployment exposes the model's full declared feature surface from the first request. This support is not narrow. It spans the catalog: across LLMs, MoE models, vision-language models, audio models, embedding and reranking models, and multimodal generation models, Syntax includes the engine-specific work that makes each model's official tool-call parser, reasoning channel, and modality inputs flow correctly through the OpenAI- and Anthropic-compatible surfaces on the Bridge. The practical consequence: an OSS model dropped into a deployment behaves like a frontier hosted model from the harness's perspective. Tool calls round-trip with full fidelity. Reasoning content arrives in the channel the harness expects. Image, audio, and other modalities work without a separate code path. Consuming an OSS deployment is not a downgraded version of consuming a hosted-provider deployment. ## Why it matters - **Zero learning curve.** You keep the keyboard shortcuts, the configuration files, the workflow you're used to. - **No harness lock-in.** If you switch from Codex to Claude Code tomorrow, your Syntax config doesn't change at all. - **No model-feature lock-in.** Reasoning, tool use, and modalities on OSS models aren't gated behind hosted-provider APIs — what the model's authors shipped is what you get through Syntax. - **Multiple harnesses simultaneously.** Connect all of them at once. They share the same Bridge, the same active model policy, the same budgets. - **Reversible.** `syntax disconnect ` puts the harness's config back exactly the way it was. The change is recorded so it can always be undone. ## What this isn't This isn't a "compatibility shim". The Bridge is a real implementation of the OpenAI- and Anthropic-compatible APIs, with full streaming, tool-call, and reasoning support. Anything you can do with those APIs, you can do through Syntax — just with the option to redirect the request anywhere. ## Compared to alternatives | Approach | Harness lock-in | OSS model-feature parity | Setup | |---|---|---|---| | Proprietary IDE hard-coded to one model | High | N/A — vendor's model only | Trivial | | Manually wire each tool's config per provider | Medium | Provider-dependent | Per-tool | | Custom proxy you wrote yourself | Low | You build it | High | | **Syntax** | None | First-class across the catalog | One install + `syntax connect` | ## Where to start - [Connecting a harness](/docs/getting-started/connecting-a-harness) - [Harnesses overview](/docs/harnesses/overview) --- # Multi-engine inference > Hardware-aware engine selection across a large compatibility matrix — Syntax owns the optimization work so you don't. Permalink: https://docs.syntax-ftc.com/docs/differentiators/multi-engine-inference Choosing how to run a model is a real engineering problem. The "right" serving stack for a given workload depends on the model architecture, the hardware family and SKU, the attention backend, the quantization format, the tool-call and reasoning parsers each engine ships, how each engine handles KV cache offload, and the way the model needs to be sharded across one or more hosts. Syntax owns this entire decision so the surface you build against stays a single, stable endpoint. ## The matrix Syntax is solving for you When you deploy a model, the autotuner is searching across — at minimum — the cross product of: - **Model architecture and modality.** Dense and Mixture-of-Experts LLMs, vision-language models, diffusion image and video generators, audio models, embedding models, rerankers, OCR, segmentation, time-series forecasters, UI-grounding models, 3D mesh-recovery models. Each has different serving constraints. - **Hardware.** Dozens of GPU SKUs across NVIDIA, AMD ROCm, and Apple Silicon; CPU-only fallback; single-host versus multi-host topologies; and the corresponding cloud instance types when running on managed remote. - **Serving engines.** Multiple engines per model family — vLLM, SGLang, TensorRT-LLM, llama.cpp, MLX, diffusion-native servers, and others — each with its own performance profile and its own feature support per model. - **Engine-internal configuration.** Attention backends (FlashAttention, PagedAttention, architecture-specific custom kernels), KV cache layout and hierarchical offload to host RAM, speculative decoding, prefix caching, quantization (W4A16, W8A8, FP8, GPTQ, AWQ, GGUF), tensor and pipeline parallelism, batch-scheduling policies. That's not a configuration; it's a search space. Picking the wrong cell costs you tokens-per-second, time-to-first-token, output correctness, or money — sometimes all four. ## "Supported" isn't the same as "best supported" A given model is frequently supported by more than one engine, but the quality of that support is rarely identical. Some of the distinctions the autotuner makes: - A model runs on two engines, but only one ships the official tool-call parser. Tool calls degrade on the other. Syntax routes to the engine with first-class parser support. - A model exposes a reasoning channel on both engines, but only one surfaces it cleanly through the OpenAI- and Anthropic-compatible Bridge. Syntax picks the engine that preserves the reasoning round-trip. - A long-context workload fits in VRAM on one engine but requires hierarchical KV cache offload to host RAM on the other. If the deployment is latency-sensitive, the in-VRAM engine wins; if it's throughput- and context-heavy, the offload-capable engine wins. - A quantized variant of a model is fast and produces faithful outputs on one engine but is numerically unstable on another at the same precision. Syntax avoids the unstable combination. This is the kind of nuance that's otherwise buried in engine release notes, GitHub issues, and benchmarks you'd have to run yourself. ## What you actually decide The user-facing input is two values, not the matrix above: - **A deployment tier.** Either *Performance* — low time-to-first-token and high tokens-per-user-per-second, willing to pay for the right hardware and serving topology — or *Cost-optimized* — aggressively minimize spend while meeting your acceptable floors for TTFT and per-user throughput. - **A target.** Local, self-managed remote, managed remote on dUX, or a hosted-provider passthrough. Everything underneath — engine selection, attention backend, quantization, parallelism, KV offload, and instance-type selection on managed remote — is the autotuner's job. ## Party-level planning A multi-model party (Main Agent, Default Sub-Agent, up to six Specialists) is a packing and isolation problem on top of single-model optimization. The autotuner plans across the whole party: - **What packs together.** Models with complementary memory profiles and compatible engines that can share a host without contention get co-tenanted to reduce cost. - **What stays separate.** Models that would harm each other's latency under load — for example, a latency-sensitive Main Agent next to a throughput-heavy diffusion specialist — get split across instances. - **Role-aware degradation.** Under VRAM pressure, specialists yield first, the sub-agent second, the Main Agent only as a last resort. Eligible smaller models can fall back to a CPU engine automatically. - **Tier propagation.** Performance versus Cost-optimized applies to the party as a whole and shapes both the packing decisions and the instance-type recommendations on managed remote. ## Scales from zero to whatever sustained traffic demands Every plan the autotuner produces is autoscalable end-to-end. Under no traffic, a deployment can sit at zero replicas; under sustained load it scales out across replicas of the same plan, fronted by the Bridge so the harness sees a single endpoint either way; when load falls off, replicas wind down. You don't pick a horizontal-pod- autoscaler policy, you don't model cold-start curves, and you don't maintain a separate scaling configuration per model — the plan already encodes how to scale itself. ## What stays the same From the harness's point of view, none of this is visible. You get the same OpenAI- or Anthropic-compatible API surface. The model appears in the harness's model list. Streaming, tool calls, and reasoning content flow through unchanged. Swapping engines, scaling out, or re-packing a party doesn't require any change in the harness. ## Where to start - [Inference → Overview](/docs/inference/overview) - [Inference → Hardware support](/docs/inference/hardware-support) - [Concepts → Party Builder](/docs/concepts/party-builder) - [Differentiators → Multi-model parties](/docs/differentiators/multi-model-parties) --- # Multi-model parties > One main agent, one sub-agent, up to six specialists — composed into a single deployment with capability scoring and a unified plan. Permalink: https://docs.syntax-ftc.com/docs/differentiators/multi-model-parties Most agent stacks pretend a single LLM is enough. A real coding workflow needs a strong main model, a cheap sub-agent for routine work, and the option to invoke specialists when the task calls for it. Syntax's **Party Builder** is the answer. ## What a party gives you A party is a single deployment that exposes: - A **Main Agent** — the model your harness primarily talks to. - A **Default Sub-Agent** — the cheaper model the main agent delegates routine tasks to. - Up to **six Specialists** — each a distinct model with a specific capability (e.g., image understanding, OCR, image generation, segmentation, TTS, time-series forecasting, etc.). Specialists are exposed to the main agent as tools. The main agent decides when to invoke them, just like any other tool call. The response is folded back into the conversation transparently. ## Why this beats one big model - **Cost.** A strong main model is expensive per token. A cheap sub-agent that handles 80% of routine work cuts the bill dramatically without losing capability on the hard 20%. - **Latency.** Smaller specialists answer faster than asking the main model to do everything. - **Specialization.** Some tasks (image segmentation, OCR, TTS) are not LLM tasks at all. Specialists let you reach the right tool for each job. - **Visibility.** The party UI shows which model is carrying which capability and where there are coverage gaps before you deploy. ## Capability scoring & plan generation When you compose a party, the Party Builder shows: - Which capabilities the chosen models cover and where there are gaps. - A per-model strength bar so you can see who's doing what. - A predicted hardware footprint — how the party will fit on your local GPU, on a self-hosted box, or on managed remote. You see all of that *before* you deploy. ## Presets If composing a party from scratch is more work than you want, the **Presets** tab gives you ready-to-deploy party definitions — schema- versioned templates for common workflows that you pick and deploy directly. Presets are also a clean way to share standard party configurations across a team or organization. ## Where it deploys A party deploys to any of the same targets as a single model: - **Local** — multiple models on your own machine (subject to fit). - **Self-Managed Remote** — your own SSH-reachable GPU box(es). - **Managed Remote** — dUX handles placement. The deployment surface is the same in each case; only the underlying hardware changes. ## Where to start - [Concepts → Party Builder](/docs/concepts/party-builder) - [Inference → Overview](/docs/inference/overview) - [Differentiators → Multi-engine inference](/docs/differentiators/multi-engine-inference) --- # Managed remote vs self-managed remote > When to pick which path — capability, control, and cost tradeoffs. Permalink: https://docs.syntax-ftc.com/docs/dux-integration/differences-vs-self-managed Syntax supports two ways to run models on remote hardware: - **Self-managed remote** — your own SSH-reachable boxes, you own the hardware and the OS, Syntax handles the engine and lifecycle. - **Managed remote (dUX)** — dUX manages the hardware on your behalf, in your own cloud accounts. You describe the deployment intent; dUX handles provisioning, placement, and scaling. You remain the sole admin and can take over directly any time you need to. Each has the right place; this page lays out the tradeoffs. ## Side-by-side | | Self-managed remote | Managed remote (dUX) | |---|---|---| | Hardware admin | You | You — dUX orchestrates, you retain full admin | | Cloud account | N/A | Yours; dUX operates within it | | GPU drivers | You install once | dUX installs and updates | | Autoscaling | None (single host or multi-host you control) | dUX, automatic | | Replica management | Manual | Automatic | | Setup time | Provisioning your own host | Minutes — pick a tier | | Network | Whatever your host has | Cloud-grade ingress | | Predictable cost | You know your bill (it's your hardware) | Hourly on the cloud you've authorized | | Privacy | Highest — you own the box | High — your cloud account, dUX-orchestrated, isolated per org | | Best for | Power users with hardware; teams with strict data residency | Teams that want managed cloud GPU without giving up admin | ## When self-managed remote is the right answer - You already own GPU hardware that's underutilized. - You want maximum control of the OS, drivers, and network. - Data residency is a hard requirement and you can't put weights through dUX. - You want SSH-level visibility into running processes for debugging. ## When managed remote is the right answer - You don't have GPU hardware. - Your team needs autoscaling because traffic is bursty. - You want a deployment that's always available without you babysitting it. - You want sharing with teammates to be one click rather than manual SSH access. ## Mixing both Nothing prevents you from using both. A common pattern: - A handful of "always available" managed remote deployments behind the org's Bridge. - One or more self-hosted remote boxes for experiments, larger models, or workloads with strict data residency. The Bridge routes per-request based on your model policy, so your harness sees one consistent set of model names regardless of where each one runs. ## Where to go next - [Inference → Managed remote](/docs/inference/managed-remote) - [Inference → Remote self-hosted](/docs/inference/remote-self-hosted) - [Syntax × dUX → Overview](/docs/dux-integration/overview) --- # Managed remote on dUX > The developer-facing flow — pick a model, pick a tier, deploy. dUX handles the rest. Permalink: https://docs.syntax-ftc.com/docs/dux-integration/managed-remote The managed remote flow is the most common way Syntax and dUX collaborate. From the developer's perspective, it's three or four clicks; behind the scenes, two systems are exchanging structured intents and placement responses. ## The user-facing flow 1. Open **Deployments → New Deployment**. 2. Pick a category (Chat, General, Coding, Media, Vision, Custom) or open the Party Builder for multi-model. 3. Pick **Managed Remote** as the target. 4. Pick a tier: **Latency** or **Throughput**. 5. Set exposure: private endpoint, public endpoint with bearer, or both. 6. Submit. Syntax submits the intent to dUX. The desktop app shows the deployment moving through statuses — accepted, provisioning, ready — and surfaces any issues as clear messages rather than dUX-internal errors. ## What you see when it's ready When dUX returns a "ready" status, Syntax wires the resulting endpoint into the Bridge. Practically, that means: - The model appears in the harness's model list exactly like a local or self-hosted-remote model. - Your harness routes to it transparently — no harness-side reconfiguration. - Multi-model parties show every model in the party as part of a single deployment in the **Active Deployments** view; the Main Agent's tool list automatically includes its specialists. ## Saved remote targets After your first managed-remote deployment, you can save the target configuration — name, tier, exposure, replica policy. Subsequent deployments to the same logical target are one click and inherit the saved settings. This is especially useful for teams that want a consistent set of deployments across members: save one set of targets at the org level and members can deploy to them without re-picking each setting. ## Lifecycle Once a deployment is ready, it stays running until you stop it. From the desktop app you can: - **Scale** — increase or decrease replica count (within the tier's bounds). - **Stop** — bring the deployment down. dUX releases the GPU resources. - **Replace** — replace the deployment with a different model (atomic where possible). - **Upgrade** — when the underlying base or engine images change, Syntax surfaces an "upgrade available" prompt; accepting issues a fresh deployment with the new images. ## Multi-model parties on managed remote When you deploy a party — Main Agent + Default Sub-Agent + up to six Specialists — the entire party deploys as a coherent unit on dUX. Specialists become tool calls available to the Main Agent exactly the way they do for local or self-hosted-remote parties. ## Where to go next - [Inference → Managed remote](/docs/inference/managed-remote) — the inference-plane view of managed remote. - [Concepts → Party Builder](/docs/concepts/party-builder) --- # Syntax × dUX > How Syntax integrates with dUX for managed cloud GPU — what each side owns, what each side contributes. Permalink: https://docs.syntax-ftc.com/docs/dux-integration/overview Syntax integrates with **dUX** to provide managed remote inference without you having to manage cloud GPU infrastructure. Managed remote is available to all users; you can also keep using local and self-hosted remote serving without involving dUX at all. ## What Syntax brings to dUX When you choose a managed remote deployment, Syntax sends dUX a **deployment intent**: which model (or party), which target tier (Latency or Throughput), how many replicas, what exposure (private vs public), what isolation level. The intent is logical — "I want this model deployed at this tier" — not a description of how to deploy it. Syntax also brings: - **Catalog metadata.** The right serving images, the right parameters, the right hardware requirements per model. - **Party-level planning.** When the deployment is a multi-model party, Syntax composes the placement intent so dUX sees a coherent multi-model deployment rather than N independent requests. - **Authentication.** Bearer-token-based auth for the deployed endpoint, including the per-deployment exposed-bearer flow. ## What dUX brings to Syntax dUX is the orchestrator that provisions and manages the hardware your workloads run on — inside your own cloud accounts, with you as the sole admin: - GPU placement and scheduling. - Driver compatibility and provisioning. - Autoscaling. - Multi-replica weight distribution. - Ingress and load balancing. - Per-organization isolation. - Lifecycle (start, scale, stop, replace, upgrade). dUX returns concrete endpoint URLs and status updates back to Syntax as the deployment progresses. The underlying machines remain under your administrative control: dUX operates them on your behalf and you can step in directly whenever you need to. ## What stays with Syntax A few things stay on the Syntax side regardless of where the model ends up running: - **Bridge.** Your harness still talks to the local Bridge. The Bridge routes requests to the dUX-managed endpoint behind the scenes. - **Tools and skills.** The agent's tool list and skills framework live in Syntax — dUX is purely about serving the underlying model. - **Sessions and memory.** Your session history, your memory, and your Plan-Mode plans are all client-side or in your home directory. dUX never sees them. - **Budgets and approvals.** Token and compute budgets are computed in Syntax against the active org policy. This separation of concerns is intentional: dUX orchestrates the cloud GPU hardware on your behalf — in your cloud accounts, with you as the sole admin — and Syntax is your control surface and your developer experience. ## What dUX never sees Putting it the other way: dUX never sees your session content, your prompts, your code, your tool calls, or your harness. It sees model weights to load, deployments to scale, and endpoints to serve — and nothing else. ## Where to go next - [Managed remote (dUX)](/docs/dux-integration/managed-remote) — the developer-facing flow. - [Permissions and IAM](/docs/dux-integration/permissions-and-iam) — identity boundaries between the two systems. - [Differences vs self-managed](/docs/dux-integration/differences-vs-self-managed) — when to pick which path. --- # Permissions & IAM > Identity boundaries between Syntax and dUX, and what each system enforces. Permalink: https://docs.syntax-ftc.com/docs/dux-integration/permissions-and-iam Syntax and dUX are two distinct systems with two distinct identity boundaries. Understanding which side enforces what is the key to predicting behavior and to debugging permission errors when they happen. ## The two identity boundaries | Boundary | What it protects | Owned by | |---|---|---| | **Syntax identity** | Who you are within Syntax (user, role, org membership). | Syntax (Control Plane). | | **dUX identity** | What infrastructure permissions Syntax has within your dUX organization. | dUX. | A request from your harness to a managed-remote endpoint crosses both: it's authenticated as you in Syntax, then translated into a deployment-scoped operation that dUX accepts because Syntax has the right dUX-side permissions to drive it. ## What Syntax enforces - **Authentication.** The bearer token your harness sends to the Bridge. - **Authorization within Syntax.** Whether you're allowed to deploy, scale, expose, or revoke under your role. - **Org policy.** Whether the model you're trying to deploy is in the org's allowed catalog. - **Budgets.** Whether the deployment fits your token / compute budget under the active policy. - **Audit.** The audit log entry for the operation. ## What dUX enforces - **Infrastructure permissions.** Whether your Syntax org has the right to provision GPUs, set up ingress, etc. in your dUX organization. - **Resource quotas.** Whether the requested deployment fits within your dUX-side quotas. - **Per-org isolation.** Whether the deployment can be placed within your dedicated isolation boundary. ## What this means in practice - **Permission errors from Syntax** look like role / policy / budget failures. They have clear messages and tell you what to ask your org admin to change. - **Permission errors from dUX** look like quota / capacity / IAM failures at the infrastructure layer. Syntax surfaces them with the dUX-side detail so the right team (your dUX admin or whoever owns the dUX org-level permissions) can resolve them. ## Settings that matter In **Settings → Managed Remote**, you can review: - The dUX organization Syntax is currently configured to talk to. - The effective dUX permissions Syntax has been granted. - A simple quick-check that exercises the connection without deploying anything. If any of those look wrong, work with your dUX administrator to adjust the permissions on their side. ## Where to go next - [Differences vs self-managed](/docs/dux-integration/differences-vs-self-managed) --- # Connecting a coding assistant > How `syntax connect` wires your existing harness to Syntax, and how to disconnect cleanly. Permalink: https://docs.syntax-ftc.com/docs/getting-started/connecting-a-harness Syntax integrates with five coding assistants out of the box: the **Syntax CLI**, **Codex**, **Claude Code**, **OpenCode**, and **Pi**. The Syntax CLI ships with Syntax and is available the moment you install it — there's nothing to connect. Every other harness is wired up with a single `syntax connect ` command, and `syntax disconnect ` puts it back exactly the way it was. ## What "connecting" means When you connect a harness, Syntax edits the harness's own configuration file to point at the local Bridge. From the harness's point of view, it's talking to a normal OpenAI- or Anthropic-compatible API; from your point of view, it's now using whatever model and policy you've configured in Syntax. The edit is recorded in a per-harness ledger, so `syntax disconnect ` restores the original configuration byte-for-byte. ## Connect from the CLI ```bash syntax connect codex syntax connect claude-code syntax connect opencode syntax connect pi ``` Each command: 1. Detects whether the tool is installed. 2. Locates its configuration file in the standard location for your OS. 3. Backs up the current configuration. 4. Edits the configuration to point at the local Bridge. 5. Records the change so it can be reverted. If the tool isn't installed, the command prints the official install instructions and exits without making any changes. Connecting and disconnecting is a CLI-only flow — there is no Harnesses page in the desktop app. The Syntax CLI is the only harness that needs no `connect` step, because it's bundled with Syntax. ## Disconnect ```bash syntax disconnect codex ``` Restores the harness's original configuration. If the tool has since been deleted from your machine, `disconnect` gracefully cleans up the Syntax-side ledger without erroring. ## Multiple harnesses at once You can have any number of harnesses connected simultaneously. Each talks to Syntax independently and gets the same active model policy. This is the primary path to "use Syntax with everything" — connect Codex for terminal work, Claude Code for chat-style coding, OpenCode for editor work, all at once. ## Per-harness notes | Harness | Notes | |---|---| | **Syntax CLI** | The default agent that ships with Syntax. Always available. | | **Codex** | Connects to the Bridge through its standard configuration. Tool calls and reasoning flow correctly. | | **Claude Code** | Uses the Anthropic-compatible Bridge route. Tool calls and reasoning flow correctly. | | **OpenCode** | JSON-configured. Straightforward connect/disconnect. | | **Pi** | Connects to the Bridge through its standard configuration. | For deeper per-harness behavior, see [Harnesses → Overview](/docs/harnesses/overview). --- # First launch > What happens the first time you open Syntax and how to land on a working setup in under five minutes. Permalink: https://docs.syntax-ftc.com/docs/getting-started/first-launch The first time you open Syntax — either the desktop app or the CLI — it runs a short, mostly automatic setup. This page describes what you'll see and the choices you can make. ## Step 1 — Sign in You can use Syntax without an account, but signing in lets you sync settings between machines and (for organizations) wires up team-level configuration. Sign in with Google, GitHub, or your organization's SSO provider. If you're an organization administrator setting up Syntax for the first time, you'll be prompted to wire up OIDC or SAML at this step. ## Step 2 — Hardware detection Syntax probes your machine for: - CPU type and core count - Available RAM - GPU(s) — NVIDIA / AMD / Apple Silicon / none - Disk space available for model weights - Docker (optional) You can review the detected configuration on the **Settings → Hardware** page at any time. ## Step 3 — Pick a starting model (optional) Syntax doesn't force you to download a model on first launch. If you want a local model immediately, the **Catalog** page shows recommended starter models for your hardware tier — small, fast, capable models that will fit comfortably on your machine. One click downloads weights and registers the model. If you'd rather start with a hosted provider (OpenAI, Anthropic, etc.), skip this step and add your provider key on **Settings → Providers**. ## Step 4 — Connect your first harness Open the **Harnesses** page or run `syntax connect ` from the CLI. You'll see the supported coding assistants. Pick the one you already use; Syntax detects whether it's installed, edits its configuration to point at the local Bridge, and records the change so it can be reverted at any time. The full walk-through is in [Connecting a harness](/docs/getting-started/connecting-a-harness). ## Step 5 — Start a session That's it. Open your harness as you normally would. It will now talk to Syntax instead of going directly to a provider. The first request you send exercises the full pipeline (model resolution → routing → serving → stream back) and any issue surfaces on the **Sessions** page with a clear diagnostic. ## What you can change later Everything in first-launch is reversible from **Settings**: - Hardware preferences (which GPU to use, how much VRAM to reserve, etc.) - Default model and aliases - Provider keys - Connected harnesses (each can be disconnected one at a time) Settings live in your home directory and are user-readable so you can see exactly what's stored. --- # Install on Linux > Install Syntax on Ubuntu 20.04+, Debian 11+, Fedora 38+, or any modern x86_64 / aarch64 distribution. Permalink: https://docs.syntax-ftc.com/docs/getting-started/install-linux Syntax runs on x86_64 and aarch64 Linux. It is tested on Ubuntu 20.04+, Debian 11+, and Fedora 38+, and works on any glibc-based distribution from roughly the same vintage. Wayland and X11 are both supported for the desktop app; the CLI works in any TTY. ## System requirements | Component | Minimum | Recommended | |---|---|---| | Distribution | glibc-based, Ubuntu 20.04+ / Debian 11+ / Fedora 38+ equivalent | Latest LTS | | Architecture | x86_64 or aarch64 | x86_64 with NVIDIA GPU | | RAM | 16 GB | 32 GB+ for larger models | | Disk | 20 GB free | 100 GB+ for local model weights | | Optional GPU | — | NVIDIA (CUDA), AMD (ROCm) | For local GPU inference, an NVIDIA GPU with recent drivers is the smoothest path. AMD ROCm is supported for compatible cards. CPU-only serving works for smaller models. ## Install ```bash curl -fsSL https://www.syntax-ftc.com/install.sh | bash ``` The installer drops binaries into a per-user location, registers the desktop entry, and sets up the first-run configuration. Run it again to upgrade. ## GPU prerequisites If you intend to run open-weight models on a local GPU: - **NVIDIA**: install the proprietary NVIDIA driver (≥ 545) and ensure `nvidia-smi` works. - **AMD ROCm**: install the ROCm runtime that matches your card and distribution. Syntax detects ROCm at first launch and falls back to CPU if it isn't available. Docker is **optional** on Linux but recommended if you plan to run GPU serving engines that ship as containers. The installer can guide you through enabling Docker if it isn't present. ## Verify ```bash syntax --version syntax doctor ``` `syntax doctor` runs a self-check that reports your detected hardware, the inference engines that will be available, and any missing dependencies. ## What's next - [First launch](/docs/getting-started/first-launch) - [Connecting a harness](/docs/getting-started/connecting-a-harness) - [Inference → Hardware support](/docs/inference/hardware-support) --- # Install on macOS > Install Syntax on macOS 12 (Monterey) or later. Permalink: https://docs.syntax-ftc.com/docs/getting-started/install-macos Syntax runs on macOS 12 (Monterey) and later, on both Apple Silicon (M1/M2/M3 and later) and Intel Macs. Apple Silicon is the recommended path: the unified memory architecture works very well for local inference, and Syntax uses the native Apple Metal stack to run open-weight models efficiently without any extra setup. ## System requirements | Component | Minimum | Recommended | |---|---|---| | macOS | 12 (Monterey) | 14 (Sonoma) or later | | CPU | Apple Silicon or 64-bit Intel | Apple Silicon (M2 Pro / M3 / M4) | | RAM | 16 GB | 32 GB+ for larger models | | Disk | 20 GB free | 100 GB+ if you plan to keep multiple model weights locally | A discrete GPU is **not** required on Apple Silicon. Syntax will use the unified memory and the Apple-native engine for eligible models. ## Install The recommended install path is a single command: ```bash curl -fsSL https://www.syntax-ftc.com/install.sh | bash ``` This downloads the installer, places the Syntax application bundle and CLI in standard system locations, and sets up the first-run configuration. The installer is idempotent — running it again on the same machine simply verifies the install or upgrades to the latest version. ## Verify After install, open a new terminal and run: ```bash syntax --version syntax doctor ``` `syntax doctor` checks for the GPU/CPU it can use, the disk space available for models, and whether your network can reach the catalog. Any warnings it prints come with a one-line fix. ## Pick a coding assistant Syntax does not ship with its own editor. To start a real session, install one of the supported coding assistants and connect it to Syntax. The [Connecting a harness](/docs/getting-started/connecting-a-harness) guide walks through the supported tools and how `syntax connect` wires them up. ## What's next - [First launch](/docs/getting-started/first-launch) — the desktop app's initial setup flow. - [Connecting a harness](/docs/getting-started/connecting-a-harness) — make Codex, Claude Code, OpenCode, and Pi talk to Syntax. --- # Install on Windows > Install Syntax on Windows 10 (21H2) or later, with full native and WSL2 paths. Permalink: https://docs.syntax-ftc.com/docs/getting-started/install-windows Syntax runs natively on Windows 10 (21H2) and later, and on Windows 11. On machines with an NVIDIA GPU it can run open-weight models locally; on machines without a GPU it can still serve smaller models on CPU and route larger requests to remote backends. The Syntax desktop app and CLI are shipped as native Windows binaries — no WSL or Docker is required for the basic experience. WSL2 is supported for power users who prefer a Linux toolchain. ## System requirements | Component | Minimum | Recommended | |---|---|---| | OS | Windows 10 (21H2) | Windows 11 | | Architecture | x64 or ARM64 | x64 with NVIDIA GPU | | RAM | 16 GB | 32 GB+ for larger models | | Disk | 20 GB free | 100 GB+ for local model weights | | Optional GPU | — | NVIDIA (CUDA) | ## Install (native) Open PowerShell and run: ```powershell iwr -useb https://www.syntax-ftc.com/install.ps1 | iex ``` The installer registers Syntax under your user profile, adds the CLI to your `PATH`, and creates Start Menu entries. Re-run to upgrade. ## Install (WSL2) If you prefer a Linux toolchain, install Syntax inside your WSL2 distribution exactly as you would on Linux: ```bash curl -fsSL https://www.syntax-ftc.com/install.sh | bash ``` You can run the desktop app on Windows and the CLI inside WSL pointing at the same control plane — both reach the local Bridge over `localhost`. ## GPU prerequisites If you have an NVIDIA GPU and want local inference: - Install the latest NVIDIA Studio or Game Ready driver. - Verify `nvidia-smi` works in PowerShell. - For WSL2 GPU serving, follow NVIDIA's CUDA-on-WSL guide; the Linux installer will detect it at first launch. ## Verify ```powershell syntax --version syntax doctor ``` `syntax doctor` reports the detected hardware, the inference engines available on this machine, and any missing dependencies. ## What's next - [First launch](/docs/getting-started/first-launch) - [Connecting a harness](/docs/getting-started/connecting-a-harness) - [Inference → Hardware support](/docs/inference/hardware-support) --- # Claude Code > Use Anthropic's Claude Code CLI with Syntax. Permalink: https://docs.syntax-ftc.com/docs/harnesses/claude-code [Claude Code](https://www.anthropic.com/claude-code) is Anthropic's official CLI for Claude. Syntax connects to it through the Anthropic-compatible surface on the Bridge, so Claude Code works unmodified. ## Connect ```bash syntax connect claude-code ``` The connect flow points Claude Code's configured Anthropic endpoint at the local Bridge. Because Claude Code natively speaks the Anthropic-compatible wire format, no shim is involved — the Bridge accepts those requests and routes them directly. ## What works through Syntax - Anthropic-style streaming, tool calls, and reasoning all pass through. - Any model Syntax exposes — Anthropic-hosted (Claude family), other hosted providers, local open-weight models, managed remote — is reachable from Claude Code through the same endpoint. - Tool definitions, prompt-caching headers, and content blocks all preserve their semantics when routed. ## Why this is interesting Claude Code is built around an Anthropic-style API. Syntax exposes the same shape on a localhost endpoint, which means Claude Code can: - Run a local open-weight model like a Claude clone. - Route to a non-Anthropic hosted provider while still using Claude Code as the UI. - Be combined with other harnesses on the same machine, all sharing the same active model policy through Syntax. ## Disconnect ```bash syntax disconnect claude-code ``` Restores Claude Code's original Anthropic endpoint configuration. ## Where to start - [Connecting a harness](/docs/getting-started/connecting-a-harness) - [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility) — how the Bridge exposes its Anthropic-compatible API surface. --- # Codex > Use OpenAI's Codex CLI with Syntax. Permalink: https://docs.syntax-ftc.com/docs/harnesses/codex [Codex](https://openai.com/codex) is OpenAI's official coding CLI. Syntax connects to it via the OpenAI-compatible Bridge, so Codex works unmodified. ## Connect ```bash syntax connect codex ``` The connect flow: - Locates Codex's configuration in the standard location for your OS. - Rewrites the model endpoint to point at the local Bridge. - Backs up the original configuration so `syntax disconnect` can restore it. If Codex isn't installed, the command prints Codex's official install instructions and exits without making changes. ## What works through Syntax - OpenAI-style streaming, tool calls, and reasoning all pass through. - Any model Syntax exposes — OpenAI-hosted, other hosted providers, local open-weight models, or managed remote inference via dUX — is reachable from Codex through the same endpoint. - If you sign in with an OpenAI Plus/Pro subscription, Syntax can route Codex requests to OpenAI's models alongside your OSS stack so simple, everyday, and complex tasks each land on the right backend. ## Disconnect ```bash syntax disconnect codex ``` Restores Codex's original endpoint configuration. ## Where to start - [Connecting a harness](/docs/getting-started/connecting-a-harness) - [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility) — how the Bridge exposes its OpenAI-compatible API surface. --- # OpenCode > Use OpenCode with Syntax. Permalink: https://docs.syntax-ftc.com/docs/harnesses/opencode [OpenCode](https://opencode.ai) is an open-source coding agent. Syntax connects to it via the OpenAI-compatible Bridge. ## Connect ```bash syntax connect opencode ``` The connect flow: - Locates OpenCode's JSON configuration in the standard location for your OS. - Rewrites the model endpoint to point at the local Bridge. - Backs up the original configuration. If OpenCode isn't installed, the command prints OpenCode's official install instructions and exits. ## What works through Syntax - Streaming, tool calls, and reasoning all pass through. - Any model Syntax exposes is usable from OpenCode. - OpenCode's session controls work unchanged. ## Disconnect ```bash syntax disconnect opencode ``` Restores OpenCode's original JSON configuration. ## Where to start - [Connecting a harness](/docs/getting-started/connecting-a-harness) - [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility) --- # Harnesses overview > The coding assistants Syntax integrates with out of the box, and how the connect/disconnect lifecycle works. Permalink: https://docs.syntax-ftc.com/docs/harnesses/overview A **harness** is the coding assistant you use day-to-day. Syntax ships with its own Codex-based `syntax-cli` and integrates with four third-party harnesses out of the box. Each one keeps using its own UX, its own configuration file, and its own personality — Syntax just sits behind the localhost endpoint they already talk to. | Harness | Type | Streaming | Tool calls | Reasoning | Anthropic-compatible | |---|---|---|---|---|---| | [Syntax CLI](/docs/harnesses/syntax-cli) | Terminal (Syntax-native) | ✓ | ✓ | ✓ | ✓ | | [Codex](/docs/harnesses/codex) | Terminal | ✓ | ✓ | ✓ | ✓ | | [Claude Code](/docs/harnesses/claude-code) | Terminal / IDE | ✓ | ✓ | ✓ | ✓ (native) | | [OpenCode](/docs/harnesses/opencode) | Terminal | ✓ | ✓ | ✓ | ✓ | | [Pi](/docs/harnesses/pi) | Terminal | ✓ | ✓ | ✓ | ✓ | ## How the connect lifecycle works Connecting any harness to Syntax follows the same lifecycle: 1. **Detect.** Syntax checks whether the harness is installed in any of the standard locations for your OS. 2. **Locate config.** It locates the harness's configuration file. 3. **Backup.** It records the current configuration so it can be restored later. 4. **Edit.** It rewrites the configuration to point at the local Bridge, and applies any harness-specific tweaks. 5. **Record.** It writes a small ledger entry under your home directory so the change can always be undone. `syntax disconnect ` walks the ledger entry in reverse: restores the original config, removes the ledger row, and the harness is back exactly the way it was. ## Connect from the CLI ```bash syntax connect codex syntax connect claude-code syntax connect opencode syntax connect pi ``` If a harness isn't installed, the command prints the official install instructions and exits without making changes. Connecting and disconnecting is a CLI-only flow. The Syntax CLI is the only harness with no `connect` step — it ships with Syntax and is available immediately. ## What "connected" means concretely A connected harness: - Sends every chat request to the local Bridge instead of going directly to a provider. - Inherits the active model policy (aliases, per-tier overrides, budgets). - Streams tokens back in the wire format it expects. - Can call any specialist tool the active deployment registers (when it's part of a multi-model party). Multiple harnesses can be connected simultaneously without interfering with each other. ## What this isn't Connecting a harness to Syntax is **not** a fork or a plugin. The harnesses are unmodified upstream binaries / extensions. Only their own configuration files are edited, and only by `syntax connect`. `syntax disconnect` is fully reversible. --- # Pi > Use Pi with Syntax. Permalink: https://docs.syntax-ftc.com/docs/harnesses/pi Pi is a terminal-first coding agent. Syntax connects to it via the OpenAI-compatible Bridge. ## Connect ```bash syntax connect pi ``` The connect flow: - Locates Pi's configuration in the standard location for your OS. - Rewrites the model endpoint to point at the local Bridge. - Backs up the original configuration. If Pi isn't installed, the command prints Pi's official install instructions and exits. ## What works through Syntax - Streaming, tool calls, and reasoning all pass through. - Any model Syntax exposes is usable from Pi. - Pi's session controls work unchanged. ## Disconnect ```bash syntax disconnect pi ``` Restores Pi's original configuration. ## Where to start - [Connecting a harness](/docs/getting-started/connecting-a-harness) - [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility) --- # Syntax CLI > Syntax's own native coding agent — always available, no separate install. Permalink: https://docs.syntax-ftc.com/docs/harnesses/syntax-cli The **Syntax CLI** is the agent that ships as part of Syntax. It's a full coding assistant in its own right and is always available without a separate install or `syntax connect` step. The CLI uses the Bridge, of course — like every other harness — but it also has a few Syntax-native capabilities that aren't available through external harnesses. ## Why use the Syntax CLI You'd pick the Syntax CLI when you want: - A coding agent that ships and updates with Syntax itself. - The full Plan Mode workflow as a first-class CLI experience. - The runtime-mode cycle (`Ctrl+M`) and explicit cancellation (`Esc`) with Syntax-native semantics. - Native session persistence and fork/rollback behavior. - Direct access to Syntax-specific tooling (background agents, cron scheduling within a session, team coordination, etc.). ## How it relates to the desktop app The desktop app and the Syntax CLI share the same agent core. A session started in the CLI can be resumed in the desktop app and vice versa. Both talk to the Bridge for inference, both inherit your active model policy, both observe the same approvals. ## Headless / scripted use The Syntax CLI also has a headless mode for CI/CD, background tasks, and scripted automation. In headless mode, the CLI runs without a TUI, prompts are surfaced through structured I/O instead of interactive input, and tool approval is handled by your configured policy rather than per-call confirmation. ## Where to start - Run `syntax-cli --help` for the full harness command list. - [CLI → Syntax Coding Harness](/docs/cli/syntax-cli) for the interactive TUI and the headless `syntax-cli exec` flow. - [CLI Reference](/docs/cli/overview) for the user-visible commands and flags. - [Concepts → Plan Mode](/docs/concepts/plan-mode) and [Runtime Modes](/docs/concepts/runtime-modes) for the agent semantics. --- # Hardware support > What hardware Syntax runs on, and which capabilities each tier unlocks. Permalink: https://docs.syntax-ftc.com/docs/inference/hardware-support Syntax detects your hardware on first launch and chooses the right serving stack for every model you deploy. This page summarizes what's supported and what each tier unlocks. ## Per-machine support matrix | Hardware | OS | LLM serving | Multimodal serving | CPU fallback | |---|---|---|---|---| | **NVIDIA GPU** (modern data-center, e.g., H100/H200/L40 class) | Linux / Windows | ✓ — full coverage | ✓ — including image, video, audio | n/a | | **NVIDIA GPU** (modern consumer, e.g., RTX 40 / 50 series) | Linux / Windows | ✓ — most models | ✓ — many multimodal | n/a | | **NVIDIA GPU** (older consumer, e.g., RTX 30 / 20 series) | Linux / Windows | ✓ — many models | partial | available | | **Apple Silicon** (M1 / M2 / M3 / M4 + Pro / Max / Ultra) | macOS | ✓ — extensive | ✓ — many multimodal | n/a | | **AMD ROCm** (RDNA 3 / CDNA 3 generation) | Linux | ✓ — most models | partial | available | | **CPU only** (modern x86_64 / ARM64) | any | ✓ — smaller models | limited | primary | ## Memory guidance For local LLM serving, plan disk and memory roughly as follows: | Model size | Disk needed for weights | RAM (CPU) | VRAM (GPU) | |---|---|---|---| | ≤ 8B parameters | ~10–20 GB | 16 GB+ | 8–16 GB | | 8–32B parameters | ~30–80 GB | 32 GB+ | 24–48 GB | | 32–70B parameters | ~80–200 GB | 64 GB+ | 48–96 GB | | ≥ 70B parameters | ~200 GB+ | 128 GB+ | 96 GB+ or multi-GPU | These are guidelines; actual requirements depend on the model, the quantization (when applicable), and the engine choice. ## Optional dependencies | Dependency | When it's needed | |---|---| | **Docker** | Optional. Recommended on Linux when you're running engines that ship as containers. The desktop app guides you through enabling Docker if it's not present. | | **NVIDIA driver** | Required on NVIDIA hardware. Syntax expects a recent driver; `syntax doctor` will warn if the version is too old. | | **ROCm runtime** | Required on AMD hardware. Syntax detects ROCm at first launch and falls back to CPU if it's missing. | ## Multi-GPU Multi-GPU is supported on Linux for both NVIDIA and AMD where the underlying engine and the chosen model support tensor- or pipeline- parallel serving. Syntax's autotuner sets the parallelism strategy based on the model and the available GPUs without you having to pick. ## Multi-host Multi-host deployments are supported via the [Remote self-hosted](/docs/inference/remote-self-hosted) and [Managed remote](/docs/inference/managed-remote) targets. For local multi-host workflows, treat each host as a remote target and deploy the party across them. ## Where to go next - [Local inference](/docs/inference/local-inference) — running models on the machine in front of you. - [Multi-engine inference](/docs/differentiators/multi-engine-inference) — why Syntax picks the engine it does. - [Multimodal capabilities](/docs/inference/multimodal) — image, video, audio, 3D, time-series forecasting. --- # Local inference > Running models on your own machine — GPU, Apple Silicon, or CPU. Permalink: https://docs.syntax-ftc.com/docs/inference/local-inference Local inference runs models on the machine Syntax is installed on. It works on everything from a CPU-only laptop to a multi-GPU workstation. ## What's supported | Hardware | Engine class | Notes | |---|---|---| | **NVIDIA GPU (Linux)** | GPU-serving engine tuned for the architecture. | Best supported — most open-weight LLMs and multimodal models work. | | **NVIDIA GPU (Windows)** | GPU-serving engine. | Same coverage as Linux, modern driver required. | | **Apple Silicon** | Native Apple Metal stack. | Excellent for M-series Macs; no container or driver overhead. | | **AMD ROCm** | GPU-serving engine for compatible cards. | Supported for current cards; check the catalog for per-model status. | | **CPU only** | Lightweight CPU serving engine. | Smaller models only. Eligible larger models can also fall back here when GPU VRAM is exhausted by co-tenants. | ## Picking what to run locally The desktop app's **Catalog** page shows recommended models for your detected hardware tier. Cards expose: - **Download Locally** — pull weights to your machine. - A clear indicator if the model won't fit on your hardware so you can pick a smaller variant. Once a model is downloaded, it's available for deployment from the **Deployments** page. ## Deploying a single model locally 1. Open **Deployments → New Deployment**. 2. Pick a category (Chat, General, Coding, Media, Vision, Custom) or pick **Custom** to compose your own. 3. Choose **Local** as the target. 4. Pick a deployment **Mode** (Latency or Throughput). 5. Submit. Syntax's autotuner picks the right engine and parameters for your hardware automatically. The deployment shows up on the **Active Deployments** page once it's serving. ## Deploying a party locally Multi-model parties deploy through the same flow. The Party Builder generates a plan that fits the whole party on your local hardware, relieving VRAM pressure by role tier when needed (see [Inference → Overview](/docs/inference/overview)). ## When local isn't enough - **VRAM-bound** by a model larger than your GPU can hold → consider a smaller variant, a quantized version, or routing to a hosted provider for that model. - **Throughput-bound** by sustained heavy load → consider remote self-hosted or managed remote. - **Cold-start sensitive** when you need a model rarely → routing to a hosted provider is often the right answer. ## Where to go next - [Hardware support](/docs/inference/hardware-support) — full hardware matrix. - [Multi-engine inference](/docs/differentiators/multi-engine-inference) — why Syntax picks the engine it does. - [Concepts → Party Builder](/docs/concepts/party-builder) — deploying multiple models locally as one party. --- # Managed remote (dUX) > dUX-backed cloud GPU. Pick a model, pick a tier, deploy. dUX handles placement, autoscaling, drivers, and ingress. Permalink: https://docs.syntax-ftc.com/docs/inference/managed-remote Managed remote is the path for teams that want cloud GPU without managing infrastructure. It is available to all users and uses **dUX** to orchestrate the hardware inside your own cloud accounts — you remain the sole admin of the underlying machines. ## How it works (at a glance) 1. You pick a model (or a party) and a target tier in the desktop app. 2. Syntax submits the deployment intent to dUX. 3. dUX handles the cloud-side work: GPU placement, autoscaling, driver compatibility, ingress, isolation. 4. dUX returns the endpoint(s). 5. Syntax wires those endpoints into the Bridge. 6. Your harness sees the managed-remote deployment as a normal model in its model list and routes transparently. ## Target tiers Two managed-remote tiers map to two optimization profiles: | Tier | Optimized for | |---|---| | **Latency** | Lowest time-to-first-token, lowest per-request latency. | | **Throughput** | Highest tokens-per-second under load, best cost-per-token. | The exact placement strategy and replica policy live in dUX's orchestration layer; from your perspective, you choose Latency or Throughput and dUX handles the rest. ## Saved remote targets The first time you deploy a model managed remotely, you can choose to **save the target** — name, tier, exposure, replica policy, and anything else you configured. Future deploys to the same logical target are one click. ## Public vs private endpoints When you deploy managed remote, you can set: - **Expose private** — the endpoint is reachable from your other Syntax tools but not from the public internet. - **Expose public** — the endpoint is reachable from anywhere with the bearer token, suitable for sharing with non-Syntax tools. Both surfaces issue a per-deployment bearer token (`sk-syntax-…`) that's scoped to the deployment and can be revoked at any time. See [Concepts → Exposed endpoints](/docs/concepts/exposed-endpoints) for the revocation flow. ## What you don't have to think about - GPU drivers and CUDA / ROCm versions. - Autoscaler configuration (KEDA, DCGM, etc.). - Kubernetes namespaces. - Ingress and load balancing. - Multi-replica weight distribution. - Node-pool capacity planning. dUX orchestrates all of that for you, inside your own cloud accounts, with you as the sole admin. Syntax stays your control surface. ## Multi-model parties on managed remote The same Party Builder that composes parties for local deployment can also deploy them to managed remote. dUX returns placements and Syntax wires every model in the party into the Bridge so the Main Agent can call its specialists transparently. ## Where to go next - [Syntax × dUX → Overview](/docs/dux-integration/overview) --- # Multimodal capabilities > Image, video, audio, 3D, UI grounding, OCR, and time-series forecasting — all reachable through the same Bridge. Permalink: https://docs.syntax-ftc.com/docs/inference/multimodal Syntax isn't limited to text. The catalog includes models for a wide range of modalities, and the Bridge exposes them as tools the main agent can invoke alongside text generation. ## Supported modalities | Modality | Examples of what's supported | |---|---| | **Text generation** | Chat, code, reasoning, structured outputs. | | **Embedding** | Sentence and code embeddings for semantic search. | | **Reranking** | Listwise reranking for retrieval pipelines. | | **Image understanding** | Vision-language models that look at images and answer questions. | | **OCR** | Optical character recognition. | | **Image processing** | Style transfer, restoration, adjustment. | | **Image generation** | Text-to-image, image-to-image diffusion. | | **Video processing** | Temporal segmentation, video Q&A. | | **Video generation** | Text-to-video, image-to-video. | | **Segmentation** | Image and video segmentation. | | **TTS (text-to-speech)** | High-quality speech synthesis. | | **Audio generation** | Music and effect generation, V2A Foley. | | **Audio transcription** | Speech-to-text. | | **Speech-to-speech** | Voice transformation, style transfer. | | **Mesh recovery** | 3D mesh from images or video. | | **UI grounding** | Locate UI elements in screenshots. | | **Time-series forecasting** | Foundation-model forecasting (Chronos, TimesFM, MOMENT, Granite-TTM, etc.). | ## How multimodal capabilities surface to your harness Each multimodal capability is exposed as a **tool** the main agent can invoke. When a multimodal model is deployed, the Bridge registers its capability — `generate_image`, `transcribe_audio`, `segment_image`, `text_to_speech`, etc. — so the main agent can pick the right tool when the user's request needs it. The capability set is **dynamic**: it's recomputed every time you deploy or undeploy. If you have an image generator deployed today and remove it tomorrow, the agent stops seeing `generate_image` as an available tool. ## Engine selection for multimodal Each modality is served by the engine class best suited to it: - LLMs and vision-language models run on GPU-serving engines. - Image and video generation run on diffusion-friendly engines. - Specialized non-LLM models (OCR, segmentation, TTS, audio generation, mesh recovery, UI grounding, time-series forecasting) run on a serving framework optimized for those workloads. - On Apple Silicon, the Apple-native stack handles eligible models. The autotuner picks all of this for you — see [Multi-engine inference](/docs/differentiators/multi-engine-inference). ## Where to go next - [Models → Modalities](/docs/models/modalities) — modality-by-modality capability summary. - [Models → Purposes](/docs/models/purposes) — the full list of Model Purpose categories. - [Concepts → Party Builder](/docs/concepts/party-builder) — adding multimodal specialists to a party. --- # Inference overview > How Syntax serves models — local, remote self-hosted, managed remote on dUX, and hosted providers. Permalink: https://docs.syntax-ftc.com/docs/inference/overview Every model Syntax exposes ends up running somewhere. The four **inference targets** are: | Target | Where it runs | Best for | |---|---|---| | **Local** | Your machine — GPU, Apple Silicon, or CPU. | Solo workflows, privacy-first, no network dependency. | | **Remote self-hosted** | A box you've provisioned (your server, your GPU, your SSH). | Power users with their own hardware. | | **Managed remote (dUX)** | dUX-managed cloud GPU. | Teams that want managed infrastructure. | | **Hosted provider** | OpenAI, Anthropic, Google, etc. | Frontier models, predictable cost, no infra. | All four are reachable through the same Bridge endpoint. Your harness doesn't know — or care — which one is serving any given request. ## How Syntax decides what to run Two layers make the decision: 1. **Routing.** When a request arrives at the Bridge, the active model policy picks which deployment serves it. If a model is deployed in multiple places (e.g., locally and on managed remote), routing picks based on your preferences. 2. **Engine selection.** For local and remote-self-hosted serving, Syntax's **autotuner** picks the most efficient serving engine for the chosen model and your hardware — see [Differentiators → Multi-engine inference](/docs/differentiators/multi-engine-inference). You can override either layer. Aliases let you pin a name to a specific deployment; per-deployment configuration lets you override engine choices when you need to. ## Multi-model deployments When you deploy a multi-model party — a Main Agent, a Default Sub-Agent, and up to six Specialists — the inference plane plans holistically: - All models in the party share the same target (local, self-managed remote, or managed remote — but not mixed). - The autotuner places each model on the available hardware in role order so the Main Agent gets the best resources. - VRAM pressure is relieved by tier when needed: specialists first, the sub-agent second, the Main Agent only as a last resort. Eligible smaller models can fall back to CPU automatically. ## Targets in depth - [Local inference](/docs/inference/local-inference) — GPU / Apple Silicon / CPU on your own machine. - [Remote self-hosted](/docs/inference/remote-self-hosted) — your own SSH-reachable hardware. - [Managed remote](/docs/inference/managed-remote) — dUX-backed cloud GPU. - [Hardware support](/docs/inference/hardware-support) — what runs on what. - [Multimodal capabilities](/docs/inference/multimodal) — image, video, audio, 3D, time-series forecasting. --- # Remote self-hosted > Run models on your own remote box — your server, your GPU, your SSH — with Syntax handling the lifecycle. Permalink: https://docs.syntax-ftc.com/docs/inference/remote-self-hosted Remote self-hosted is for users who already have a GPU server (a beefy home tower, a colo box, a cloud VM you provisioned yourself) and want Syntax to drive it without giving up control of the machine. ## What "remote self-hosted" means You provide: - An SSH-reachable host with the right hardware (GPU, RAM, disk). - An account / key Syntax can use to log in. Syntax handles: - Engine installation on the remote host (curated images, no manual driver wrangling beyond the GPU driver itself). - Model weight delivery. - Engine lifecycle (start, stop, health checks). - Wiring the resulting endpoint into the local Bridge so your harness reaches the remote model the same way it reaches a local one. ## Setting up a remote target From **Settings → Remote Targets** in the desktop app: 1. Add the host (hostname, port, username). 2. Provide an SSH key that's authorized on the host. 3. Test the connection — Syntax verifies it can reach the host, has access to the right paths, and can probe the GPU. 4. Save. Once a remote target is saved, deploying a model to it is the same flow as a local deployment — just pick **Self-Managed Remote** as the target. ## Disk layout on the remote host Syntax keeps remote artifacts under a small set of well-known paths in your home directory on the remote host. Weights, engine binaries, and log files all live in predictable locations so they're easy to clean up if you ever decide to remove Syntax. ## Engine selection on the remote host The same multi-engine inference logic that runs locally also runs on the remote host: Syntax picks the right engine for the model and the remote hardware. You don't have to install or manage CUDA, ROCm, attention backends, or quantization toolchains by hand. ## Multi-host remote deployments Some models are too large to fit on a single host. For multi-host deployments, you provide multiple remote targets and pick a **Strategy**: - **Performance** — one model per host (lowest latency). - **Economy** — pack onto the fewest hosts (lowest cost). The strategy applies to multi-model parties only; single-host targets ignore it. ## When to use remote self-hosted - You already own a GPU server. - You want full control of the OS and drivers. - You want SSH-level visibility into the running process. - You don't want managed cloud GPU pricing or vendor lock-in. ## When to use managed remote (dUX) instead - You don't have hardware and don't want to provision and maintain it. - You want autoscaling without writing it yourself. - Your team needs shared deployments behind a single endpoint. → [Managed remote](/docs/inference/managed-remote) --- # How Syntax works > A high-level walkthrough of Syntax's three planes, the Bridge, and what happens to a request from your editor to a model and back. Permalink: https://docs.syntax-ftc.com/docs/introduction/how-it-works Syntax is built around three planes that work together but stay clearly separated. You don't have to understand every layer to use Syntax — but the mental model below is enough to predict what will happen for any given configuration. ## The three planes | Plane | What it owns | |---|---| | **Control** | Identity, organization policy, secrets, budgets, audit logs. | | **Execution** | Your sessions, the harness lifecycle, the local proxy, approvals, tool orchestration. | | **Inference** | The model catalog, hardware detection, engine selection, autotuning, model lifecycle. | The three planes are deliberately decoupled. The control plane never sees the content of your sessions; the execution plane never has to think about how a model is autotuned for your specific GPU; the inference plane never reaches into your editor. ## First-class inter-compatibility Every supported coding assistant talks to a single OpenAI- and Anthropic-compatible endpoint on `localhost`. That endpoint is the **Bridge** — the piece of Syntax that accepts requests in the format your harness already speaks and routes them to the right backend. ## What happens to a request 1. Your harness sends a chat request to its configured endpoint, which is actually Syntax's local Bridge. 2. The Bridge resolves the requested model name against your active model policy (alias resolution, tier overrides, budget caps). 3. The Bridge picks a backend — local engine, remote self-hosted engine, dUX-managed remote, or a hosted provider — based on what's deployed and what your policy allows. 4. The chosen backend serves the request. Local serving uses the most efficient engine for your hardware (see [Multi-engine inference](/docs/differentiators/multi-engine-inference)). 5. Tokens stream back to your harness in the wire format it expects, so streaming, tool calls, reasoning, and multimodal content all render correctly. ## What you control Syntax exposes a small set of high-leverage knobs: - **Model policy** — which models are allowed for which tiers, with aliases and per-deployment overrides. - **Routing strategy** — Latency vs Throughput, Performance vs Economy on multi-host deploys, public vs private endpoint exposure. - **Approvals** — what tool calls your harness is allowed to run without asking, and how risky operations get gated. - **Budgets** — hard caps and soft warnings for tokens or compute, per user and per organization. ## Where this plays out - The Bridge — what it is and why every harness talks to it — is covered in [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility). - The catalog and inference engines are covered in [Inference](/docs/inference/overview). - The dUX-backed managed remote story is in [Syntax × dUX](/docs/dux-integration/overview). --- # What is Syntax? > Syntax is your fully managed, privately owned, general-purpose AI factory. Permalink: https://docs.syntax-ftc.com/docs/introduction/what-is-syntax ## The Problem Syntax Solves AI usage is exploding. Whether you use AI for coding, as a personal assistant, for media generation, or to power novel agentic products, compute and token costs add up quickly. As models become more capable, utilizing them at scale becomes increasingly expensive. As businesses increasingly integrate AI agents into their core workflows, and with per-task token consumption being highly unpredictable, they introduce a volatile and escalating OpEx component that can threaten the viability of entire business models. Privacy and compliance present another major hurdle. With foundational model providers expanding into diverse domains, from healthcare to legal to finance, companies risk exposing proprietary data, code, and business logic, which could inadvertently be used to train a competitor's model. Additionally, strict data sovereignty laws often prohibit organizations from passing sensitive information through off-the-shelf commercial APIs. Finally, efficiently deploying and scaling the latest high-performance OSS models is a complex engineering challenge. Even with the assistance of AI, building out this infrastructure remains inaccessible to many companies lacking specialized MLOps and systems-level talent. ## Syntax's Solution The integration of Syntax and dUX addresses these three core issues from the ground up. Syntax operates as a built-in application within dUX, designed to instantly and effortlessly deploy OSS and proprietary AI on local, private, or fully managed compute resources. It supports a vast catalog of models across dozens of categories, each pre-configured for either low-latency or high-throughput efficiency across a wide array of hardware architectures. Syntax natively supports high-performance deployments via SGLang, vLLM, Triton, llama.cpp, and Whisper.cpp, automatically routing each model to its most suitable inference engine, with automatic scaling from 0 to infinity already wired-in. Unlike token-based APIs, dUX charges exclusively for the hourly usage of underlying compute resources. Billing is completely decoupled from token consumption or API calls, insulating you from the unpredictable behavior of autonomous AI agents. You pay only the base infrastructure costs of the chosen cloud provider + dUX's premium, with the Syntax platform itself provided at zero additional costs! Because costs are strictly capped by the hourly rate of the provisioned hardware, businesses can accurately forecast their cloud expenditures, transforming unpredictable OpEx into a known, manageable expense. Furthermore, Syntax and dUX guarantee absolute privacy. Clients act as the sole administrators of any provisioned resources. Native integration with your own secrets management systems creates a strict technical guarantee — rather than a reliance on vendor trust — ensuring that neither our team nor any third party can access your infrastructure or data. Finally, by allowing clients to select specific cloud providers and regions, and by fully supporting deployments on private, on-premise, or even air-gapped infrastructure, Syntax enables organizations to harness frontier AI capabilities while maintaining strict compliance with all local and industry regulations. ## Where to Start - New to Syntax? Continue with [How it works](/docs/introduction/how-it-works). - Ready to install? Pick your platform under [Getting Started](/docs/getting-started/install-macos). --- # Why Syntax? > The reasons teams pick Syntax over rolling their own AI deployment stack. Permalink: https://docs.syntax-ftc.com/docs/introduction/why-syntax There are plenty of ways to call an LLM from a coding tool or product. The reason teams pick Syntax — and stay on it — is that it solves a handful of hard problems together that are usually solved separately and badly. ## 1. Predictable Costs dUX bills hourly for compute, not per token. Billing is decoupled from token consumption entirely, so autonomous agents can't quietly run up a bill — costs are capped by the hourly rate of the hardware you've provisioned. The Syntax platform itself is provided at zero additional cost; you pay the underlying provider plus dUX's premium and nothing else. ## 2. Private & Secure Managed remote deployments run on WireGuard-based internal networks. Endpoints are not reachable from the public internet unless you explicitly opt in by issuing a public exposed bearer; private exposures stay entirely within your perimeter. Neither Syntax nor dUX will ever access your machines, data, logs, files, or code. Integrate dUX with your own secrets manager and that becomes a *technical* guarantee — not a vendor trust statement — that no one outside your org can reach your infrastructure. Enterprise tenants run in fully isolated environments; on-premise and air-gapped deployments are first-class. ## 3. Optimized deployments Choosing how to run a given model is a real engineering problem. The right combination of hardware SKU, cloud instance type, serving engine, attention backend, quantization format, and parallelism strategy differs for every model — and a model that's "supported" by two different engines may only run well on one of them. Syntax's autotuner navigates that decision space for you, across the catalog and across the available hardware. When you deploy a multi-model party, the same autotuner plans the whole party: packing co-tenants where it saves cost, isolating models that would harm each other's latency, and propagating your Performance vs Cost-optimized tier across every model. You choose a tier and a target; everything below is handled automatically and scales from zero to whatever sustained traffic demands. → [Differentiators → Multi-engine inference](/docs/differentiators/multi-engine-inference) ## 4. Use the harness you already love Syntax doesn't ask you to switch editors or learn a new IDE. Codex, Claude Code, OpenCode, Pi, and the Syntax-native `syntax-cli` all work out of the box. The integration is reversible: `syntax connect` edits the harness's own configuration to point at Syntax, and `syntax disconnect` puts it back exactly the way it was. The point isn't just compatibility — it's that you can keep the tool you already trust while shifting the workload underneath it onto cost-efficient OSS models you deploy yourself. The harness doesn't care; you get the same UX with a fraction of the per-task cost. → [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility) ## 5. Pick the best model for the job A real workflow needs a strong main model, a cheaper sub-agent, and sometimes specialists for things like image understanding, OCR, search, embeddings, or image generation. Syntax's **Models Party** lets you compose those into a single deployment with one main agent, one default sub-agent, and up to six specialists — and the main model can call specialists as tools. This is where the cost story compounds. Most "frontier model" workloads are actually a mix of simple, routine, and genuinely hard sub-tasks. A well-composed party routes the simple and routine work to small, cheap models and reserves frontier capacity for the small fraction of tasks that actually need it. → [Differentiators → Multi-model parties](/docs/differentiators/multi-model-parties) ## 6. Scale to the cloud without managing infrastructure For team and production workloads, the same model name you ran locally resolves on managed remote infrastructure via dUX, with autoscaling from zero to whatever traffic demands already wired in across dozens of public cloud providers. dUX provisions the hardware inside your own cloud accounts — you remain the sole admin, and you describe what you want; dUX handles placement, drivers, autoscaling, ingress, and lifecycle. → [Syntax × dUX → Overview](/docs/dux-integration/overview) ## And one bonus: AI-agent-friendly These docs ship as a normal website *and* as a Markdown corpus an AI agent can ingest in one fetch. Every page has a raw-Markdown sibling URL. There's a [`llms.txt`](/llms.txt) index, a [full corpus](/llms-full.txt), and a [JSON sitemap](/api/sitemap.json). When an agent in your codebase needs to reason about Syntax's capabilities, point it here. → [Differentiators → AI-agent-friendly](/docs/differentiators/ai-agent-friendly) --- # Catalog overview > Hundreds of models across many purposes, all reachable through the same Bridge. Permalink: https://docs.syntax-ftc.com/docs/models/catalog-overview The **catalog** is Syntax's curated set of models. It includes hundreds of open-weight and provider-hosted models across every model purpose Syntax supports — text generation, embedding, image generation, video generation, OCR, segmentation, TTS, audio generation, mesh recovery, UI grounding, audio transcription, speech-to-speech, image processing, video processing, reranking, and time-series forecasting. ## What's in the catalog Each catalog entry includes everything Syntax needs to serve the model without you having to wire it up by hand: - The model's identity (name and provenance). - Its model purpose (e.g., text generation, image generation, OCR). - Its modalities (text, image, video, audio). - Recommended serving parameters for the engines Syntax can use. - The model's license, so attribution and usage requirements are surfaced before you deploy. When a model is in the catalog, deploying it is a one-click operation on any supported target — local, self-managed remote, managed remote, or routed to a hosted provider. ## Open-weight vs provider-hosted The catalog mixes two kinds of models: - **Open-weight** models that Syntax can serve directly on your local hardware, on a self-hosted remote, or on managed remote (dUX). The weights are downloaded and served by Syntax. - **Provider-hosted** models that Syntax routes to (OpenAI, Anthropic, Google, etc.). You bring the API key; Syntax handles the routing, alias resolution, and budget tracking. A single model can have both faces — for example, a frontier model reachable through a hosted provider and an open-weight equivalent you can run locally — and Syntax routes per session based on your preferences. ## How to browse The desktop app's **Catalog** page is the primary surface for browsing models. It supports: - Search by name or capability. - Filter by model purpose, modality, size, license, and curated tier. - Sort by recency, parameter count, or curated rank. The Catalog page is read-only — you browse here, then move to the **Deployments** page (or the Party Builder) when you're ready to deploy. ## Per-model deployment options Every catalog entry exposes: - **Download Locally** when local serving is supported on your hardware. - **Download Remotely** when you have a self-managed remote target configured. - **Serverless activation** when the model is reachable through a hosted provider you've configured. Whether a button is enabled depends on your hardware tier and your configured providers — Syntax greys out options that won't work rather than letting you pick something that will fail. ## Where to go next - [Models → Purposes](/docs/models/purposes) — the model purpose taxonomy. - [Models → Modalities](/docs/models/modalities) — what each modality means in practice. - [Models → Reasoning models](/docs/models/reasoning-models) — how reasoning effort flows through the Bridge. - [Models → Tool use](/docs/models/tool-use) — how tool calls work across model families. - [Models → Licensing](/docs/models/licensing) — license display and attribution. --- # Licensing & attribution > Every model in the catalog declares its license. Syntax surfaces it before you deploy and at runtime. Permalink: https://docs.syntax-ftc.com/docs/models/licensing The Syntax catalog is currently restricted to models whose licenses permit general commercial use, with a strong preference for models under permissive licenses such as MIT or Apache 2.0. Every model in the catalog declares its license. Syntax surfaces this information at three points so you always know the license you're working under. ## Where you see the license 1. **At browse time.** The Catalog page shows the license on every model card. You can also filter by license family. 2. **At deploy time.** Before you confirm a deployment, Syntax shows the license again with any usage notes (e.g., research-only, commercial-use restrictions, redistribution rules). 3. **At runtime.** Each deployed model's status page shows its license and any attribution requirements. Models that require visible attribution badges in shipped products surface that requirement clearly. ## EULA gates Some models ship under an end-user license agreement that requires explicit acceptance before download. For those, Syntax displays the EULA and waits for you to accept before pulling weights. The acceptance is recorded so you don't have to re-accept on subsequent deployments of the same model. ## Provider terms For provider-hosted models (OpenAI, Anthropic, Google, etc.), the relevant terms are the provider's own. Syntax surfaces a link to the provider's usage policy on the catalog card and during deployment. ## Per-model READMEs When a model has an upstream README — for example, on Hugging Face — Syntax downloads it alongside the weights. The README is reachable from the deployed model's status page, so the canonical model card is right there if you need to check details. ## Why this matters License surface area is real. A model that's permissive for research but restrictive for commercial use, or that requires attribution in the resulting product, or that excludes specific use cases, can become a compliance issue if it's deployed without that information visible. Syntax makes the license a first-class piece of metadata so the right team can see it before the weights ever get downloaded. ## Where to go next - [Catalog overview](/docs/models/catalog-overview) --- # Modalities > Text, image, video, and audio — what each modality means in Syntax and how multimodal models surface to your harness. Permalink: https://docs.syntax-ftc.com/docs/models/modalities Where **purpose** is what a model is for, **modality** is what kind of data the model accepts and emits. Models can be unimodal (text only) or multimodal (text + image, or text + audio, etc.). ## The four common modalities | Modality | Meaning | |---|---| | **Text** | Tokens — chat, code, structured outputs. | | **Image** | Still images, in or out. | | **Video** | Sequences of frames, in or out. | | **Audio** | Audio waveforms — speech and non-speech. | A vision-language model is "text + image" in. A diffusion image generator is "text" in, "image" out. A speech-to-speech model is "audio" in, "audio" out. A multimodal LLM might accept all four. ## How multimodal LLMs work through Syntax When you deploy a multimodal LLM (text + image, text + audio, etc.): - The model is registered with its declared modalities. - The Bridge accepts content blocks (image URLs, base64-encoded images, audio chunks) in the appropriate API surface and routes them to the model. - Streaming, tool calls, and reasoning continue to work alongside multimodal input. If your harness sends a multimodal request to a unimodal model, the Bridge returns a clear error rather than silently dropping the non-text content. ## How non-LLM multimodal models work through Syntax Models with non-LLM modalities — image generators, OCR, segmenters, TTS, audio generators, mesh recovery, UI grounding, time-series forecasting — surface as **tools** on the main agent rather than chat-completion targets. Concretely: when an image generator is deployed, the main agent sees a `generate_image` tool. When the user's request needs an image, the agent calls the tool, the tool runs the model, and the result is folded back into the conversation. The same pattern applies to every non-LLM modality. ## Capability scoring in the Party Builder The Party Builder uses modality coverage as part of its capability scoring. When you compose a party, you can see at a glance: - Which input modalities your party can handle. - Which output modalities your party can produce. - Where there are gaps — for example, "no image generation in this party" or "no audio transcription". Picking a specialist that closes a gap is a single click. ## Where to go next - [Models → Purposes](/docs/models/purposes) — the purpose taxonomy. - [Inference → Multimodal capabilities](/docs/inference/multimodal) — what each modality looks like at runtime. - [Concepts → Party Builder](/docs/concepts/party-builder) --- # Model purposes > The coarse classification Syntax uses to know how each model should be served. Permalink: https://docs.syntax-ftc.com/docs/models/purposes A **model purpose** is the coarse classification of what a model is for. Syntax uses model purpose to decide: - Which serving engine is right for the model. - Whether the model surfaces as a tool to the main agent (and which tool name). - Which UI surfaces should expose it (Catalog filters, Party Builder capability scoring, etc.). ## The current purposes | Purpose | What it is | |---|---| | **Generation** | Text generation — the LLMs your harness chats with. | | **Embedding** | Sentence and code embeddings for retrieval. | | **Reranking** | Listwise reranking on top of retrieval results. | | **OCR** | Extract text from images. | | **ImageProcessing** | Style transfer, restoration, adjustment. | | **VideoProcessing** | Temporal segmentation and video Q&A. | | **ImageGeneration** | Text-to-image and image-to-image diffusion. | | **VideoGeneration** | Text-to-video and image-to-video. | | **Segmentation** | Pixel-precise segmentation for images and video. | | **TTS** | Text-to-speech synthesis. | | **AudioGeneration** | Music, effects, and V2A Foley. | | **MeshRecovery** | 3D mesh from images or video. | | **UIGrounding** | Locate UI elements in screenshots. | | **AudioTranscription** | Speech-to-text. | | **SpeechToSpeech** | Voice transformation, style transfer. | | **TimeSeriesForecasting** | Foundation-model time-series forecasting. | ## Why this matters When you compose a multi-model party, the Party Builder uses model purposes to: - Show capability coverage — which purposes your party covers and where there are gaps. - Suggest specialists for purposes the main agent doesn't cover well. - Register the right tool name so the main agent can invoke each specialist correctly (`generate_image`, `transcribe_audio`, `segment_image`, etc.). When you deploy a single model, the model purpose determines which serving engine Syntax picks — see [Differentiators → Multi-engine inference](/docs/differentiators/multi-engine-inference). ## How a new purpose lands The set of purposes is expandable; new purposes are added as the model ecosystem grows. The most recent addition is **TimeSeriesForecasting**, covering foundation-model time-series forecasters. When a new purpose ships, it lights up automatically in the Catalog filters, the Party Builder capability scoring, and the engine routing. ## Where to go next - [Catalog overview](/docs/models/catalog-overview) - [Models → Modalities](/docs/models/modalities) — modality vs. purpose. - [Differentiators → Multi-model parties](/docs/differentiators/multi-model-parties) --- # Reasoning models > How reasoning effort flows through Syntax — three distinct mechanisms behind one consistent control. Permalink: https://docs.syntax-ftc.com/docs/models/reasoning-models Modern frontier models support **reasoning** — explicit thinking before answering. Different model families implement reasoning differently, and Syntax normalizes those differences behind one consistent control so your harness doesn't have to know. ## The three reasoning mechanisms | Mechanism | Used by | What "reasoning effort" maps to | |---|---|---| | **Native API reasoning** | Provider-native APIs that already expose a reasoning field (e.g., the Anthropic and Google reasoning fields, OpenAI's `reasoning.effort`, DeepSeek's native think modes). | Directly forwarded to the provider in the format they expect. | | **Mechanism A** | Open-weight models that expose an explicit "thinking" toggle and budget through their chat template. | Translated into the model's native thinking control plus a budget appropriate to the model's context length. | | **Mechanism B** | Models without a native thinking control. | An orchestrator runs a planning → execution → critique → repair → verify loop on top of the model. | ## What you see in the harness Across all three mechanisms, your harness sets a **reasoning effort** (low / medium / high). Syntax translates that into whatever the specific model's family expects and the response comes back with the reasoning fields intact. The harness doesn't have to know which mechanism is in use. ## Reasoning enabled by default When a model is deployed and the catalog declares it as a reasoning model, Syntax enables reasoning by default at sensible levels: - **Mechanism A** models get their native thinking toggle on, with a budget appropriate to the model's context length (long-context variants get a larger budget). - **Mechanism B** models get the orchestrator at "high" effort. - **Native-API reasoning** models get the provider's "high" or equivalent setting. You can override at any level — per session, per request, or per deployment. ## Reasoning + tool calls Reasoning and tool calls compose. Reasoning models can plan, call tools, and incorporate tool results back into their reasoning before answering. The Bridge preserves both the reasoning and the tool-call fields end-to-end. ## Where to go next - [Models → Tool use](/docs/models/tool-use) - [Concepts → Plan Mode](/docs/concepts/plan-mode) — a different flavor of structured reasoning, applied to the whole task. - [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility) — how the Bridge surfaces the reasoning channel through Chat Completions and Messages. --- # Tool use > How tool calls work through Syntax — across providers, across engines, across model families. Permalink: https://docs.syntax-ftc.com/docs/models/tool-use Tool use — the model deciding to call a function and the harness executing it — is a core part of any modern coding workflow. Syntax normalizes tool use across model families so your harness sees a consistent interface. ## What works through the Bridge When your harness sends a request with tools defined, you get: - **Auto tool choice.** The model picks when to call tools and when to answer directly. - **Streamed tool calls.** Tool-call deltas stream as they're generated; your harness can render or execute them as they arrive. - **Tool-call preservation across turns.** Tool calls and their results are preserved in conversation history so the model can reason about them later. - **Mixed content.** A response can mix natural-language text and tool calls in one turn. ## Across model families Different model families expose tool calling differently. Some use OpenAI-style function calls, some use Anthropic-style tool blocks, some use a model-specific chat template. Syntax handles the translation: - **Provider-hosted models** (OpenAI, Anthropic, Google, etc.) get their native tool-call format. - **Open-weight models** with a native tool-call chat template get their template's expected format. Where applicable, the Bridge also enables the engine's auto-tool-choice path so the model reliably emits structured tool calls instead of free-form text. - **Open-weight models without** a native tool-call template fall back to a structured-output approach that produces parseable tool calls regardless. The harness sees the same OpenAI- or Anthropic-shaped tool calls either way. ## Specialists as tools When you deploy a multi-model party, every specialist is registered as a tool the main agent can call. ## Approvals on tool calls Tool-call execution is gated by your active **Runtime Mode**. In Default, every meaningful tool call asks for confirmation before running; in AutoEdit, common edits and routine commands run unattended; in Bypass, everything runs without asking. See [Concepts → Runtime Modes](/docs/concepts/runtime-modes). ## Where to go next - [Models → Reasoning models](/docs/models/reasoning-models) — reasoning composes with tool use. - [Concepts → Specialist Models](/docs/concepts/specialist-models) — how specialists become tools. - [Differentiators → First-class inter-compatibility](/docs/differentiators/first-class-inter-compatibility) — how the Bridge surfaces tool definitions through Chat Completions and Messages. --- # Linux > Platform notes for running Syntax on Linux. Permalink: https://docs.syntax-ftc.com/docs/platforms/linux Syntax runs on x86_64 and aarch64 Linux. Tested on Ubuntu 20.04+, Debian 11+, and Fedora 38+; works on most modern glibc-based distributions. ## What works on Linux - The full desktop app (Wayland and X11) and CLI. - Local inference on NVIDIA GPUs — the smoothest GPU path. - Local inference on AMD ROCm — supported for compatible cards. - Local inference on CPU for smaller models. - All seven coding harnesses. - Self-managed remote and managed-remote inference. - Multi-GPU on a single host where the model and the engine support it. ## NVIDIA notes - Install a recent proprietary NVIDIA driver (≥ 545 recommended). - `nvidia-smi` should work in your shell. - Docker is optional; if you have it installed, Syntax can use containerized engines for some models. The desktop app guides you through enabling Docker if it isn't present. ## AMD ROCm notes - Install the ROCm runtime that matches your card and distribution. - `rocminfo` should work. - Coverage varies by model — `syntax doctor` reports what's available for your specific card. ## CPU-only notes - Smaller LLMs (≤ ~8B parameters with GGUF support) work well on modern CPUs. - Multimodal models are limited on CPU; route those to a hosted provider or to a remote target. ## Wayland and X11 The desktop app supports both Wayland and X11. On distributions where X11 is the legacy fallback, Syntax detects the active session and uses the right toolchain automatically. ## Headless servers The CLI works on headless servers without any of the desktop toolchain. This is the typical setup for CI runners and self-managed remote targets — you can install the CLI alone, configure the Bridge, and serve models without ever bringing up the GUI. ## Where to go next - [Install on Linux](/docs/getting-started/install-linux) - [Hardware support](/docs/inference/hardware-support) --- # macOS > Platform notes for running Syntax on macOS. Permalink: https://docs.syntax-ftc.com/docs/platforms/macos Syntax runs on macOS 12 (Monterey) and later, on both Apple Silicon and Intel Macs. Apple Silicon is the recommended path. ## What works on macOS - The full desktop app and CLI. - Local inference on Apple Silicon via the Apple-native engine — fast, low-overhead, no Docker required. - Local inference on Intel Macs via the CPU path (smaller models only). - All seven supported coding harnesses; `syntax connect` knows the macOS-standard paths for each. - Self-managed remote and managed-remote inference. ## Apple Silicon notes - The unified-memory architecture means you get GPU-class throughput on models that fit in RAM. - Many vision-language and multimodal models work well on M-series chips. - The Apple-native engine doesn't require Docker, drivers, or any manual setup beyond installing Syntax. ## Intel Mac notes - Local LLM serving on Intel Macs uses the CPU engine. Stick to smaller models. - For larger workloads, route to a hosted provider, run a self-managed remote target, or use managed remote. ## Install paths Standard install: ```bash curl -fsSL https://www.syntax-ftc.com/install.sh | bash ``` The desktop app appears in `/Applications`; the CLI appears in your user-local `bin` directory and is added to your `PATH`. ## Where Syntax stores data Syntax stores its configuration, model weights, and per-user state under your home directory. The desktop app's **Settings → Storage** page shows the exact paths and total disk usage. ## Where to go next - [Install on macOS](/docs/getting-started/install-macos) - [Hardware support](/docs/inference/hardware-support) --- # Windows > Platform notes for running Syntax on Windows. Permalink: https://docs.syntax-ftc.com/docs/platforms/windows Syntax runs natively on Windows 10 (21H2) and later, and on Windows 11. Both the desktop app and the CLI ship as native Windows binaries; WSL2 is supported for users who prefer a Linux toolchain. ## What works on Windows - The full desktop app (native, no WSL required). - The CLI as a native Windows binary, registered on your `PATH`. - Local inference on NVIDIA GPUs. - CPU-only inference on machines without a GPU. - All seven coding harnesses; `syntax connect` knows the Windows-standard config paths for each. - Self-managed remote and managed-remote inference. ## Native install PowerShell: ```powershell iwr -useb https://www.syntax-ftc.com/install.ps1 | iex ``` The installer registers Syntax under your user profile, adds the CLI to your `PATH`, and creates Start Menu entries. ## NVIDIA on Windows - Install the latest NVIDIA Studio or Game Ready driver. - `nvidia-smi` should work in PowerShell. - The native Windows path covers most LLM serving needs without WSL. ## WSL2 If you'd rather use a Linux toolchain, install Syntax inside your WSL2 distribution exactly as you would on Linux. The desktop app on Windows and the CLI inside WSL can share the same control plane; both reach the local Bridge over `localhost`. For NVIDIA GPU serving inside WSL2, follow NVIDIA's CUDA-on-WSL guide; the Linux installer detects the GPU at first launch. ## ARM64 Windows ARM64 Windows is supported. CPU-bound workloads work natively; GPU acceleration depends on the specific ARM64 hardware (e.g., NPU support on Snapdragon X is evolving — `syntax doctor` reports what it can use on your specific machine). ## Where to go next - [Install on Windows](/docs/getting-started/install-windows) - [Hardware support](/docs/inference/hardware-support)