LLM Setup
Built-in Model
Cai ships with a built-in model (Ministral 3B) that downloads automatically on first launch (~2.15 GB). It runs locally via Metal GPU acceleration — no external server or configuration needed.
The built-in model handles all AI-powered actions out of the box: Summarize, Reply, Fix Grammar, Translate, Explain, and Ask AI.
Custom Models
You can also run your own GGUF models with the built-in provider. Drop any .gguf file into:
~/Library/Application Support/Cai/models/
Then select it from the model picker in Settings, or click the chip icon in the action view. Cai will restart the built-in server with your chosen model.
Using an External Provider
Want to use a different or larger model? Cai works with any OpenAI-compatible server — local or remote. Just switch the provider in settings.
Supported Providers
| Provider | Default URL | Setup |
|---|---|---|
| LM Studio | http://127.0.0.1:1234/v1 | Download → Load a model → Start server |
| Ollama | http://127.0.0.1:11434/v1 | Install → ollama pull llama3.2 |
| Jan AI | http://127.0.0.1:1337/v1 | Download → Load a model → Start server |
| LocalAI | http://127.0.0.1:8080/v1 | Setup guide |
| Open WebUI | http://127.0.0.1:8080/v1 | Install → Enable OpenAI API |
| GPT4All | http://127.0.0.1:4891/v1 | Download → Enable API server |
| OpenAI | https://api.openai.com/v1 | Get API key → Enter in Cai settings |
| Google Gemini | https://generativelanguage.googleapis.com/v1beta/openai | Get API key → Enter in Cai settings |
| Mistral | https://api.mistral.ai/v1 | Get API key → Enter in Cai settings |
| Custom | User-defined | Any OpenAI-compatible server (local or cloud) |
How to Configure
- Left-click the Cai menu bar icon (or click the Cai logo in the action window footer)
- Select your Model Provider from the dropdown
- If using Custom, enter your server’s full URL
That’s it — Cai will use your external LLM instead of the built-in model.
API Key (Optional)
If your server requires authentication (e.g., cloud providers), enter your API key in Cai’s settings. The key is stored locally on your Mac and sent only to the server you configure — never to Cai or any third party.
Recommended Models
Not sure which model to use? See our recommended models for tested suggestions based on your hardware and use case.
Verify Your Server
You can confirm your external LLM server is running by opening Terminal and running:
# LM Studio (default)
curl http://127.0.0.1:1234/v1/models
# Ollama
curl http://127.0.0.1:11434/v1/models
If you get a JSON response listing models, your server is ready.
Note: Cai uses the OpenAI-compatible
/v1/chat/completionsendpoint. Any server (local or cloud) that implements this API will work.