model¶

Manage local LLM models. Models are stored under ~/.foil/models/ and registered in ~/.foil/models.json.

foil model¶

Manage LLM models.

Usage:

foil model [OPTIONS] COMMAND [ARGS]...

Options:

  --help  Show this message and exit.

foil model activate¶

Activate a downloaded model. Restarts vllm-mlx if server is running.

Usage:

foil model activate [OPTIONS] NAME

Options:

  --help  Show this message and exit.

foil model delete¶

Delete a downloaded model.

Usage:

foil model delete [OPTIONS] NAME

Options:

  --help  Show this message and exit.

foil model download¶

Download a model from HuggingFace.

Usage:

foil model download [OPTIONS] REPO_ID

Options:

  --help  Show this message and exit.

foil model list¶

List downloaded models.

Usage:

foil model list [OPTIONS]

Options:

  --help  Show this message and exit.

Default model¶

Foil ships with mlx-community/Qwen2.5-Coder-7B-Instruct-4bit as the default. It's a ~4 GB 4-bit quantised code-specialised model that runs at ~50 tokens/s on M-series Macs.

Switching models¶

# Download an alternative
foil model download mlx-community/Qwen2.5-Coder-14B-Instruct-4bit

# Activate it for new scans
foil model activate Qwen2.5-Coder-14B-Instruct-4bit

# Verify
foil server status

Larger models are more accurate but need more unified memory (14B ≈ 8 GB, needs a 24 GB+ Mac). The engine restarts automatically on activation.