Ollama (Local Embeddings & LLM)
Ollama provides free, unlimited local embeddings for memory search — no API keys, no rate limits, no cloud dependency.
Why Ollama?
| Problem | Solution |
|---|---|
node-llama-cpp crashes on Apple Silicon (Metal GPU bug) |
Ollama handles Metal natively |
| Gemini/OpenAI free tier hits rate limits (429) | Local = zero API calls |
| Memory search fails when quota exhausted | Always available offline |
Supported Platforms
| Platform | Status | Notes |
|---|---|---|
| macOS (Apple Silicon M1–M4) | ✅ Tested | Metal GPU acceleration |
| macOS (Intel) | ✅ Works | CPU only, slower |
| Ubuntu / Debian | ✅ Works | NVIDIA GPU optional (CUDA auto-detected) |
| WSL2 (Windows) | ✅ Works | GPU passthrough with NVIDIA |
| Windows native | ✅ Works | Direct install from ollama.com |
Install
macOS
# Install
brew install ollama
# Start as background service (auto-starts on login)
brew services start ollama
# Pull embedding model (274 MB)
ollama pull nomic-embed-text
# Optional: chat model for local tasks (2 GB)
ollama pull llama3.2:3b
Ubuntu / Debian
# Install (one-liner)
curl -fsSL https://ollama.com/install.sh | sh
# Ollama starts automatically via systemd
systemctl status ollama
# Pull models
ollama pull nomic-embed-text
ollama pull llama3.2:3b # optional
NVIDIA GPU (optional): Ollama auto-detects CUDA if drivers are installed (nvidia-smi should work).
WSL2
# Option A: Install inside WSL2 (recommended)
curl -fsSL https://ollama.com/install.sh | sh
# Option B: Install on Windows natively (download from ollama.com)
# Then access from WSL2 at http://host.docker.internal:11434
Verify
ollama list
curl -s http://localhost:11434/v1/models
Zirkabot Configuration
Set memorySearch in your zirkabot.json to use Ollama's OpenAI-compatible API:
{
"agents": {
"defaults": {
"memorySearch": {
"provider": "openai",
"model": "nomic-embed-text",
"remote": {
"apiKey": "ollama",
"baseUrl": "http://localhost:11434/v1"
},
"fallback": "gemini",
"query": {
"hybrid": {
"enabled": true,
"vectorWeight": 0.7,
"textWeight": 0.3
}
},
"experimental": { "sessionMemory": true },
"sources": ["memory", "sessions"],
"cache": {
"enabled": true
}
}
}
}
}
| Field | Value | Why |
|---|---|---|
provider |
"openai" |
Ollama exposes an OpenAI-compatible API |
remote.baseUrl |
http://localhost:11434/v1 |
Ollama's local endpoint |
remote.apiKey |
"ollama" |
Required by provider, ignored by Ollama |
model |
"nomic-embed-text" |
768-dim embeddings, fast, good quality |
fallback |
"gemini" |
Optional cloud fallback if Ollama is down |
experimental.sessionMemory |
true |
Index past conversations for search |
sources |
["memory", "sessions"] |
Search both memory files and session transcripts |
For WSL2 accessing Windows-hosted Ollama, use http://host.docker.internal:11434/v1 as the baseUrl.
Resource Usage
| Model | Disk | RAM (loaded) | Purpose |
|---|---|---|---|
nomic-embed-text |
274 MB | ~300 MB | Memory search embeddings |
llama3.2:3b |
2.0 GB | ~2 GB | Local chat (optional) |
Ollama unloads models from RAM after 5 minutes of inactivity.
Alternative Embedding Models
| Model | Dimensions | Size | Notes |
|---|---|---|---|
nomic-embed-text |
768 | 274 MB | Recommended — good balance |
mxbai-embed-large |
1024 | 670 MB | Higher quality, more RAM |
all-minilm |
384 | 46 MB | Smallest, fastest |
To switch models: ollama pull <model> then update "model" in config.
Troubleshooting
Ollama not responding
# macOS
brew services restart ollama
# Linux
sudo systemctl restart ollama
Memory search returns empty after switching
The memory index rebuilds when the embedding provider/model changes. Give it a moment on first search, or restart Zirkabot.
MLX warning on macOS
WARN MLX dynamic library not available
Harmless — Ollama uses Metal instead. No action needed.
WSL2 can't reach Ollama on Windows
Use http://host.docker.internal:11434/v1 instead of localhost.