Step 11: Podcast integration (whisper.cpp)

With whisper.cpp you can transcribe podcasts and other audio recordings entirely locally on your M4 Mac mini. The M4 chip executes this via Metal (the GPU) extremely fast: one hour of audio takes approximately 3–5 minutes.

11a. Install whisper.cpp

brew install whisper-cpp

Verify the installation:

whisper-cpp --version

11b. Download and transcribe a podcast

yt-dlp supports many podcast platforms beyond YouTube (SoundCloud, Podbean, direct MP3 links via RSS). The full pipeline in two steps:

# Step 1: download audio to inbox/
yt-dlp -x --audio-format mp3 \
  "https://[podcast-url]" \
  -o "~/Documents/ResearchVault/inbox/%(title)s.%(ext)s"

# Step 2: transcribe — whisper detects the language automatically
whisper-cpp --model small \
  ~/Documents/ResearchVault/inbox/[filename].mp3

This creates a .txt and a .vtt file alongside the .mp3. The .txt file contains the transcription without timestamps; the .vtt file contains timestamps per segment.

Language

Whisper detects the language automatically based on the first few seconds of audio. For most monolingual podcasts (Dutch or English) this works fine and you do not need to configure anything. Only pass --language explicitly if you encounter problems, for example with multilingual content or if automatic detection picks the wrong language:

# Only needed if detection fails:
whisper-cpp --model small --language nl ~/Documents/ResearchVault/inbox/[file].mp3
whisper-cpp --model small --language en ~/Documents/ResearchVault/inbox/[file].mp3

In the daily workflow, Claude Code lets the language be detected automatically based on metadata (podcast title, channel, description) and only passes --language explicitly if that metadata is unclear.

Models and quality

Model	Size	Speed (1 hour audio)	Quality
`base`	~145 MB	~2 min	Good for clear speech
`small`	~465 MB	~4 min	Recommended starting point
`medium`	~1.5 GB	~8 min	Better for accents, fast speech
`large`	~3 GB	~15 min	Best quality

Models are automatically downloaded on first use.

11c. Process in the vault

After the transcript is available in inbox/, open Claude Code in your vault:

cd ~/Documents/ResearchVault
claude

Give the instruction:

Process the podcast transcript in inbox/[filename].txt into a structured
note in literature/ with summary, key points, and timestamped quotes.

Claude Code follows the conventions from CLAUDE.md (see step 7d): title, speaker, summary, key points with timestamps, and links to related vault notes.

Privacy note: whisper.cpp runs entirely locally on your M4. No audio leaves your machine.

11d. Shortcut: full pipeline in one step

You can also have Claude Code run the full pipeline with a single instruction:

podcast https://[url-to-episode]

Claude Code downloads the audio, automatically determines the language based on metadata, runs whisper.cpp, and processes the transcript into a literature note — see also the skill (step "activate skill"). The Zotero _inbox step is skipped: the podcast goes directly to Obsidian. This is intended for episodes you have already evaluated and approved.

Local Research Workflow