Hardware support
What hardware Syntax runs on, and which capabilities each tier unlocks.
Syntax detects your hardware on first launch and chooses the right serving stack for every model you deploy. This page summarizes what's supported and what each tier unlocks.
Per-machine support matrix
| Hardware | OS | LLM serving | Multimodal serving | CPU fallback |
|---|---|---|---|---|
| NVIDIA GPU (modern data-center, e.g., H100/H200/L40 class) | Linux / Windows | ✓ — full coverage | ✓ — including image, video, audio | n/a |
| NVIDIA GPU (modern consumer, e.g., RTX 40 / 50 series) | Linux / Windows | ✓ — most models | ✓ — many multimodal | n/a |
| NVIDIA GPU (older consumer, e.g., RTX 30 / 20 series) | Linux / Windows | ✓ — many models | partial | available |
| Apple Silicon (M1 / M2 / M3 / M4 + Pro / Max / Ultra) | macOS | ✓ — extensive | ✓ — many multimodal | n/a |
| AMD ROCm (RDNA 3 / CDNA 3 generation) | Linux | ✓ — most models | partial | available |
| CPU only (modern x86_64 / ARM64) | any | ✓ — smaller models | limited | primary |
Memory guidance
For local LLM serving, plan disk and memory roughly as follows:
| Model size | Disk needed for weights | RAM (CPU) | VRAM (GPU) |
|---|---|---|---|
| ≤ 8B parameters | ~10–20 GB | 16 GB+ | 8–16 GB |
| 8–32B parameters | ~30–80 GB | 32 GB+ | 24–48 GB |
| 32–70B parameters | ~80–200 GB | 64 GB+ | 48–96 GB |
| ≥ 70B parameters | ~200 GB+ | 128 GB+ | 96 GB+ or multi-GPU |
These are guidelines; actual requirements depend on the model, the quantization (when applicable), and the engine choice.
Optional dependencies
| Dependency | When it's needed |
|---|---|
| Docker | Optional. Recommended on Linux when you're running engines that ship as containers. The desktop app guides you through enabling Docker if it's not present. |
| NVIDIA driver | Required on NVIDIA hardware. Syntax expects a recent driver; syntax doctor will warn if the version is too old. |
| ROCm runtime | Required on AMD hardware. Syntax detects ROCm at first launch and falls back to CPU if it's missing. |
Multi-GPU
Multi-GPU is supported on Linux for both NVIDIA and AMD where the underlying engine and the chosen model support tensor- or pipeline- parallel serving. Syntax's autotuner sets the parallelism strategy based on the model and the available GPUs without you having to pick.
Multi-host
Multi-host deployments are supported via the Remote self-hosted and Managed remote targets. For local multi-host workflows, treat each host as a remote target and deploy the party across them.
Where to go next
- Local inference — running models on the machine in front of you.
- Multi-engine inference — why Syntax picks the engine it does.
- Multimodal capabilities — image, video, audio, 3D, time-series forecasting.