Phase 3: processing to the vault

Phase 3 converts approved items into structured Obsidian notes. All generation runs locally via Qwen3.5:9b. Add --hd to any command to use Claude Sonnet 4.6 instead (after explicit confirmation).

Papers

Papers reach _inbox via the Zotero browser extension, the iOS app, or automatically via the feedreader (after calibration). After a Go decision in Phase 2:

verwerk recente papers

Claude Code:

Retrieves metadata from Zotero MCP (title, authors, year, journal, citation key, tags) — no full text
Calls the local subagent process_item.py with only the item key and metadata:
- process_item.py fetches the full text locally, generates a structured note via Qwen3.5:9b, builds the YAML frontmatter, and writes the .md file to literature/
- Claude Code receives only {"status": "ok", "path": "literature/..."} — no source content
Adds [[internal links]] to related notes in the vault
Removes the item from Zotero _inbox

The generated note contains:

YAML frontmatter (title, authors, year, journal, citation key, tags, status)
Core question and main argument
Key findings (3–5 points)
Methodological notes
Relevant quotes (original language)
Links to related notes

Notes are saved to literature/[author-year-keyword1-keyword2].md — Qwen selects 2–4 nouns from the title and TLDR.

Privacy: no paper content ever appears in Claude Code's context. process_item.py is a self-contained local subagent — it fetches, generates, and writes without returning any source text to the orchestration layer.

YouTube items follow an eager transcript pipeline: when you mark a video ✅ in the feedreader, attach-transcript.py runs automatically and stores a cleaned transcript as an attachment in the Zotero item — mirroring how a PDF accompanies a paper. This makes Go/No-go decisions content-based.

If the transcript attachment is missing (e.g. for manually added items), run it explicitly:

~/.local/share/uv/tools/zotero-mcp-server/bin/python3 .claude/attach-transcript.py \
  --item-key ITEMKEY --url "https://www.youtube.com/watch?v=..."

This script:

Fetches the transcript via YouTubeTranscriptApi (or from .claude/transcript_cache/)
Qwen3.5:9b generates an abstract
Uploads the transcript as a .txt attachment to Zotero; sets abstractNote

After a Go decision, generate the literature note the same way as papers:

verwerk recente papers

Claude Code calls process_item.py, which reads the transcript attachment from Zotero locally via fetch-fulltext.py. No transcript content reaches Claude Code.

The generated note contains:

YAML frontmatter (title, authors, year, tags, status, Zotero deep link)
TLDR
Key findings (3–5 points)
Methodological notes
(No "Relevant quotes" section — timestamps are unreliable without a verifiable source)
Links to related notes
Flashcards (max 3)

Notes are saved to literature/[author-year-keyword1-keyword2].md — Qwen selects 2–4 nouns from the title and TLDR.

Podcasts

Podcast transcripts are created manually via attach-transcript.py — whisper.cpp requires audio download and transcription (minutes of processing), so it cannot run in the batch pipeline.

~/.local/share/uv/tools/zotero-mcp-server/bin/python3 .claude/attach-transcript.py \
  --item-key ITEMKEY --url "https://podcast-episode-page-url"

This script:

Downloads audio using the direct MP3 URL cached from the RSS <enclosure> tag (via feedreader-score.py) or falls back to yt-dlp
Detects language automatically from cached show notes (Dutch show notes → --language nl); override with --language if needed
Transcribes locally via whisper-cli (model: large-v3-turbo, Metal GPU, ~2–3 min per 30 min audio on M4)
If abstractNote is already filled (show notes set by enrich-inbox.py): moves it to a child note titled "Shownotes"
Generates an abstract via Qwen3.5:9b; sets abstractNote; stores transcript as .txt linked-file attachment; adds tag _enriched-transcript

After a Go decision, generate the literature note via process_item.py — same as papers.

If yt-dlp fails with "Unsupported URL": add the feed to feedreader-list.txt. After the next feedreader-score.py run, the direct audio URL is cached and used automatically.

Notes follow the same structure as YouTube: no "Relevant quotes" section (timestamps unreliable); all other sections as papers.

RSS web articles

Non-academic articles from RSS feeds that you forward to _inbox can be processed in two ways:

Via Zotero (recommended for articles worth citing):

verwerk recente papers

The item is already in _inbox with metadata from the Zotero Connector. Processed as a standard literature note.

Notes get #web or #beleid as appropriate.

After processing

After each session, check whether:

New notes are linked to related existing notes ([[double brackets]])
Relevant syntheses in syntheses/ need updating
Flashcards should be created for the new notes:

maak flashcards voor literature/[note].md

If new papers were added to Zotero, update the semantic search database:

update-zotero

Local Research Workflow

Phase 3: processing to the vault

Papers

YouTube videos

Podcasts

RSS web articles

After processing