bunker-admin c7becf330c Initial commit: Live Captions web application
Real-time speech-to-text using OpenAI Whisper (faster-whisper).
Features browser audio capture, WebSocket streaming, and customizable display settings.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 08:53:40 -07:00

5.4 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Live Captions is a Dockerized web application that provides real-time speech-to-text captions using OpenAI's Whisper model (via faster-whisper). It captures microphone audio in the browser, streams it to a Flask backend for transcription, and displays captions with customizable styling.

Commands

Development

# Build and run (primary development command)
docker compose up --build

# Run in background
docker compose up -d --build

# View logs
docker compose logs -f

# Stop
docker compose down

# Reset all data (database + cached models)
docker compose down -v

First-time setup

cp .env.example .env
docker compose up --build

Architecture

Browser                          Docker Container
┌─────────────────────┐         ┌─────────────────────────────┐
│  MediaRecorder API  │         │  Flask + Flask-SocketIO     │
│  (1.5s audio chunks)│ ──────► │         (app.py)            │
│                     │ WebSocket│            │                │
│  Caption Display    │ ◄────── │  faster-whisper transcriber │
│  (word-by-word)     │         │      (transcriber.py)       │
│                     │         │            │                │
│  Settings Panel     │ ──────► │  SQLite settings persistence│
│                     │ REST API│      (database.py)          │
└─────────────────────┘         └─────────────────────────────┘

Data Flow

  1. Browser captures mic audio using MediaRecorder, sends base64-encoded WebM chunks every 1.5s via WebSocket
  2. Backend converts WebM→WAV using pydub/ffmpeg, transcribes with faster-whisper
  3. Transcribed text sent back via WebSocket transcription event
  4. Frontend animates words appearing one-by-one for streaming effect

Key Files

  • app.py: Flask server with SocketIO WebSocket handlers and REST API for settings
  • transcriber.py: Whisper model loading and audio transcription (singleton model instance)
  • database.py: SQLite CRUD for user display preferences
  • static/js/app.js: Audio capture, WebSocket client, word animation queue
  • static/js/settings.js: Settings panel UI and persistence

Configuration

Environment variables in .env:

  • WHISPER_MODEL: Model size (tiny/base/small/medium/large) - affects accuracy vs speed
  • WHISPER_DEVICE: cpu or cuda
  • WHISPER_COMPUTE_TYPE: int8/float16/float32

User display settings stored in SQLite (data/settings.db):

  • Font family, size, weight, color
  • Background color, opacity, border radius, padding
  • Max words (controls caption buffer length)

API Endpoints

Endpoint Method Purpose
/ GET Main UI
/api/health GET Health check
/api/settings GET/PUT Read/update user settings
/api/settings/reset POST Reset to defaults

WebSocket Events

Event Direction Payload
audio_data client→server {audio: base64, format: 'webm'}
transcription server→client {text: string}
settings_updated server→client settings object

Volumes

  • ./data:/app/data - SQLite database persistence
  • whisper-models - Cached Whisper model files (~140MB for base)

NVIDIA GPU Support

GPU acceleration significantly improves transcription speed. Follow these steps to enable it.

Prerequisites

  1. NVIDIA GPU with CUDA support
  2. NVIDIA driver installed (nvidia-smi should work)
  3. Docker installed

Install NVIDIA Container Toolkit

# Add NVIDIA package repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Install the toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Configure Docker to use NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify installation
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

Configure for GPU

  1. Update .env:
WHISPER_DEVICE=cuda
WHISPER_COMPUTE_TYPE=float16
  1. Run with GPU support:
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build

GPU Compute Types

Type Speed Memory Notes
float16 Fast Medium Recommended for most GPUs
int8_float16 Faster Lower Good balance
float32 Slower Higher Maximum precision

Troubleshooting

  • "could not select device driver": NVIDIA Container Toolkit not installed or Docker not restarted
  • CUDA out of memory: Try a smaller model (WHISPER_MODEL=small or tiny)
  • Verify GPU access: docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi