Syntax Docs

Syntax detects your hardware on first launch and chooses the right serving stack for every model you deploy. This page summarizes what's supported and what each tier unlocks.

Per-machine support matrix

Hardware	OS	LLM serving	Multimodal serving	CPU fallback
NVIDIA GPU (modern data-center, e.g., H100/H200/L40 class)	Linux / Windows	✓ — full coverage	✓ — including image, video, audio	n/a
NVIDIA GPU (modern consumer, e.g., RTX 40 / 50 series)	Linux / Windows	✓ — most models	✓ — many multimodal	n/a
NVIDIA GPU (older consumer, e.g., RTX 30 / 20 series)	Linux / Windows	✓ — many models	partial	available
Apple Silicon (M1 / M2 / M3 / M4 + Pro / Max / Ultra)	macOS	✓ — extensive	✓ — many multimodal	n/a
AMD ROCm (RDNA 3 / CDNA 3 generation)	Linux	✓ — most models	partial	available
CPU only (modern x86_64 / ARM64)	any	✓ — smaller models	limited	primary

Memory guidance

For local LLM serving, plan disk and memory roughly as follows:

Model size	Disk needed for weights	RAM (CPU)	VRAM (GPU)
≤ 8B parameters	~10–20 GB	16 GB+	8–16 GB
8–32B parameters	~30–80 GB	32 GB+	24–48 GB
32–70B parameters	~80–200 GB	64 GB+	48–96 GB
≥ 70B parameters	~200 GB+	128 GB+	96 GB+ or multi-GPU

These are guidelines; actual requirements depend on the model, the quantization (when applicable), and the engine choice.

Optional dependencies

Dependency	When it's needed
Docker	Optional. Recommended on Linux when you're running engines that ship as containers. The desktop app guides you through enabling Docker if it's not present.
NVIDIA driver	Required on NVIDIA hardware. Syntax expects a recent driver; `syntax doctor` will warn if the version is too old.
ROCm runtime	Required on AMD hardware. Syntax detects ROCm at first launch and falls back to CPU if it's missing.

Multi-GPU is supported on Linux for both NVIDIA and AMD where the underlying engine and the chosen model support tensor- or pipeline- parallel serving. Syntax's autotuner sets the parallelism strategy based on the model and the available GPUs without you having to pick.

Multi-host

Multi-host deployments are supported via the Remote self-hosted and Managed remote targets. For local multi-host workflows, treat each host as a remote target and deploy the party across them.

Where to go next

Local inference — running models on the machine in front of you.
Multi-engine inference — why Syntax picks the engine it does.
Multimodal capabilities — image, video, audio, 3D, time-series forecasting.

Hardware support

Per-machine support matrix

Memory guidance

Optional dependencies

Multi-GPU

Multi-host

Where to go next

On this page