Ollama (Local Embeddings & LLM)

Ollama provides free, unlimited local embeddings for memory search — no API keys, no rate limits, no cloud dependency.

Why Ollama?

Problem	Solution
`node-llama-cpp` crashes on Apple Silicon (Metal GPU bug)	Ollama handles Metal natively
Gemini/OpenAI free tier hits rate limits (429)	Local = zero API calls
Memory search fails when quota exhausted	Always available offline

Supported Platforms

Platform	Status	Notes
macOS (Apple Silicon M1–M4)	✅ Tested	Metal GPU acceleration
macOS (Intel)	✅ Works	CPU only, slower
Ubuntu / Debian	✅ Works	NVIDIA GPU optional (CUDA auto-detected)
WSL2 (Windows)	✅ Works	GPU passthrough with NVIDIA
Windows native	✅ Works	Direct install from ollama.com

Install

macOS

# Install
brew install ollama

# Start as background service (auto-starts on login)
brew services start ollama

# Pull embedding model (274 MB)
ollama pull nomic-embed-text

# Optional: chat model for local tasks (2 GB)
ollama pull llama3.2:3b

Ubuntu / Debian

# Install (one-liner)
curl -fsSL https://ollama.com/install.sh | sh

# Ollama starts automatically via systemd
systemctl status ollama

# Pull models
ollama pull nomic-embed-text
ollama pull llama3.2:3b   # optional

NVIDIA GPU (optional): Ollama auto-detects CUDA if drivers are installed (nvidia-smi should work).

WSL2

# Option A: Install inside WSL2 (recommended)
curl -fsSL https://ollama.com/install.sh | sh

# Option B: Install on Windows natively (download from ollama.com)
# Then access from WSL2 at http://host.docker.internal:11434

Verify

ollama list
curl -s http://localhost:11434/v1/models

Zirkabot Configuration

Set memorySearch in your zirkabot.json to use Ollama's OpenAI-compatible API:

{
  "agents": {
    "defaults": {
      "memorySearch": {
        "provider": "openai",
        "model": "nomic-embed-text",
        "remote": {
          "apiKey": "ollama",
          "baseUrl": "http://localhost:11434/v1"
        },
        "fallback": "gemini",
        "query": {
          "hybrid": {
            "enabled": true,
            "vectorWeight": 0.7,
            "textWeight": 0.3
          }
        },
        "experimental": { "sessionMemory": true },
        "sources": ["memory", "sessions"],
        "cache": {
          "enabled": true
        }
      }
    }
  }
}

Field	Value	Why
`provider`	`"openai"`	Ollama exposes an OpenAI-compatible API
`remote.baseUrl`	`http://localhost:11434/v1`	Ollama's local endpoint
`remote.apiKey`	`"ollama"`	Required by provider, ignored by Ollama
`model`	`"nomic-embed-text"`	768-dim embeddings, fast, good quality
`fallback`	`"gemini"`	Optional cloud fallback if Ollama is down
`experimental.sessionMemory`	`true`	Index past conversations for search
`sources`	`["memory", "sessions"]`	Search both memory files and session transcripts

For WSL2 accessing Windows-hosted Ollama, use http://host.docker.internal:11434/v1 as the baseUrl.

Resource Usage

Model	Disk	RAM (loaded)	Purpose
`nomic-embed-text`	274 MB	~300 MB	Memory search embeddings
`llama3.2:3b`	2.0 GB	~2 GB	Local chat (optional)

Ollama unloads models from RAM after 5 minutes of inactivity.

Alternative Embedding Models

Model	Dimensions	Size	Notes
`nomic-embed-text`	768	274 MB	Recommended — good balance
`mxbai-embed-large`	1024	670 MB	Higher quality, more RAM
`all-minilm`	384	46 MB	Smallest, fastest

To switch models: ollama pull <model> then update "model" in config.

Troubleshooting

Ollama not responding

# macOS
brew services restart ollama

# Linux
sudo systemctl restart ollama

Memory search returns empty after switching

The memory index rebuilds when the embedding provider/model changes. Give it a moment on first search, or restart Zirkabot.

MLX warning on macOS

WARN MLX dynamic library not available

Harmless — Ollama uses Metal instead. No action needed.

WSL2 can't reach Ollama on Windows

Use http://host.docker.internal:11434/v1 instead of localhost.