+ LLM Setup — Cai Docs

LLM Setup

Built-in Model

Cai ships with a built-in model (Ministral 3B) that downloads automatically on first launch (~2.15 GB). It runs locally via Metal GPU acceleration — no external server or configuration needed.

The built-in model handles all AI-powered actions out of the box: Summarize, Reply, Fix Grammar, Translate, Explain, and Ask AI.

Custom Models

You can also run your own GGUF models with the built-in provider. Drop any .gguf file into:

~/Library/Application Support/Cai/models/

Then select it from the model picker in Settings, or click the chip icon in the action view. Cai will restart the built-in server with your chosen model.

Using an External Provider

Want to use a different or larger model? Cai works with any OpenAI-compatible server — local or remote. Just switch the provider in settings.

Supported Providers

ProviderDefault URLSetup
LM Studiohttp://127.0.0.1:1234/v1Download → Load a model → Start server
Ollamahttp://127.0.0.1:11434/v1Installollama pull llama3.2
Jan AIhttp://127.0.0.1:1337/v1Download → Load a model → Start server
LocalAIhttp://127.0.0.1:8080/v1Setup guide
Open WebUIhttp://127.0.0.1:8080/v1Install → Enable OpenAI API
GPT4Allhttp://127.0.0.1:4891/v1Download → Enable API server
OpenAIhttps://api.openai.com/v1Get API key → Enter in Cai settings
Google Geminihttps://generativelanguage.googleapis.com/v1beta/openaiGet API key → Enter in Cai settings
Mistralhttps://api.mistral.ai/v1Get API key → Enter in Cai settings
CustomUser-definedAny OpenAI-compatible server (local or cloud)

How to Configure

  1. Left-click the Cai menu bar icon (or click the Cai logo in the action window footer)
  2. Select your Model Provider from the dropdown
  3. If using Custom, enter your server’s full URL

That’s it — Cai will use your external LLM instead of the built-in model.

API Key (Optional)

If your server requires authentication (e.g., cloud providers), enter your API key in Cai’s settings. The key is stored locally on your Mac and sent only to the server you configure — never to Cai or any third party.

Not sure which model to use? See our recommended models for tested suggestions based on your hardware and use case.

Verify Your Server

You can confirm your external LLM server is running by opening Terminal and running:

# LM Studio (default)
curl http://127.0.0.1:1234/v1/models

# Ollama
curl http://127.0.0.1:11434/v1/models

If you get a JSON response listing models, your server is ready.

Note: Cai uses the OpenAI-compatible /v1/chat/completions endpoint. Any server (local or cloud) that implements this API will work.