Step 6: Install Ollama

Local vs. cloud — what happens where?

It is important to understand what runs locally in this workflow and what goes through the cloud:

StepWhereNotes
Zotero MCP (querying library)✅ LocalConnection via localhost
yt-dlp (fetching transcripts)✅ LocalScraping on your Mac
whisper.cpp (transcribing audio)✅ LocalM4 Metal GPU
Semantic search (Zotero MCP)✅ LocalLocal vector database
Reasoning, summarizing, writing synthesesLocalQwen3.5:9b via Ollama (default); Anthropic API only when --hd is used

In the default mode, all generative work — summaries, literature notes, flashcards — is handled locally by Qwen3.5:9b via Ollama. Claude Code orchestrates the workflow but does not send source content to the Anthropic API. No tokens, no data transfer.

The honest trade-off: local models are less capable than Claude Sonnet for complex or nuanced tasks. For simple summaries and flashcards the difference is small; for writing rich literature notes or drawing subtle connections between sources, Claude Sonnet is noticeably better. Add --hd to any request to switch to Claude Sonnet 4.6 via the Anthropic API for that task — Claude Code always announces this and asks for confirmation before making the API call.


6a. Install

brew install ollama

6b. Download models

For an M4 Mac mini with 24 GB memory, Qwen3.5:9b is the recommended default model for all local processing tasks in the workflow:

ollama pull qwen3.5:9b   # ~6.6 GB — default model for all tasks

Qwen3.5:9b has a context window of 256K tokens, recent training with explicit attention to multilingualism (201 languages, including Dutch), and a hybrid architecture that stays fast even with long input. It replaces both llama3.1:8b and mistral for all workflow tasks.

# Optional alternatives (not needed if you use qwen3.5:9b):
ollama pull llama3.1:8b   # ~4.7 GB — proven, but 128K context and less multilingual
ollama pull phi3           # ~2.3 GB — very compact, for systems with less memory

Check which models are available after downloading:

ollama list

6c. Start Ollama

ollama serve

Leave this running in a separate Terminal window, or configure it as a background service so it is always available:

# Start automatically at system startup (recommended):
brew services start ollama

Check whether Ollama is active and which models are available:

ollama list

6d. How Ollama is used in the workflow

Qwen3.5:9b is the default engine for all generative tasks. Claude Code calls Ollama via its bash tool whenever it needs to generate a summary, write a literature note, or create flashcards. No explicit configuration is needed per task — the CLAUDE.md in your vault already sets this as the default.

Option 1: Verify per task (if needed)

To confirm that a specific task runs locally, you can instruct Claude Code explicitly:

Use Ollama (qwen3.5:9b) to create a summary of this transcript.
Call the model via: ollama run qwen3.5:9b

Claude Code runs the command locally via the bash tool and processes the output without making an Anthropic API call for that step.

Option 2: Default rule in CLAUDE.md

The starter CLAUDE.md from step 7d already contains the following section — no further action needed. It instructs Claude Code to use Ollama by default for all processing tasks:

## Local processing via Ollama

By default, all processing runs locally via Qwen3.5:9b. Use this for:
- Literature notes based on papers or transcripts
- Thematic syntheses
- Flashcard generation

Call Ollama via the bash tool:
`ollama run qwen3.5:9b < inbox/[filename].txt`

**Maximum quality mode:** if the user adds `--hd` or explicitly asks for "maximum quality" or "use Sonnet", switch to Claude Sonnet 4.6 via the Anthropic API. Always announce this first and wait for confirmation before making the API call. Never automatically fall back to Sonnet if Qwen is unreachable — report that Ollama is not active and ask what the user wants.

Option 3: Test whether Ollama is reachable

Verify from Claude Code that Ollama is active and processes a test prompt:

ollama run qwen3.5:9b "Give a three-sentence summary about substitution care."

If this returns a response, Ollama is ready for use from Claude Code.