# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview Live Captions is a Dockerized web application that provides real-time speech-to-text captions using OpenAI's Whisper model (via faster-whisper). It captures microphone audio in the browser, streams it to a Flask backend for transcription, and displays captions with customizable styling. ## Commands ### Development ```bash # Build and run (primary development command) docker compose up --build # Run in background docker compose up -d --build # View logs docker compose logs -f # Stop docker compose down # Reset all data (database + cached models) docker compose down -v ``` ### First-time setup ```bash cp .env.example .env docker compose up --build ``` ## Architecture ``` Browser Docker Container ┌─────────────────────┐ ┌─────────────────────────────┐ │ MediaRecorder API │ │ Flask + Flask-SocketIO │ │ (1.5s audio chunks)│ ──────► │ (app.py) │ │ │ WebSocket│ │ │ │ Caption Display │ ◄────── │ faster-whisper transcriber │ │ (word-by-word) │ │ (transcriber.py) │ │ │ │ │ │ │ Settings Panel │ ──────► │ SQLite settings persistence│ │ │ REST API│ (database.py) │ └─────────────────────┘ └─────────────────────────────┘ ``` ### Data Flow 1. Browser captures mic audio using MediaRecorder, sends base64-encoded WebM chunks every 1.5s via WebSocket 2. Backend converts WebM→WAV using pydub/ffmpeg, transcribes with faster-whisper 3. Transcribed text sent back via WebSocket `transcription` event 4. Frontend animates words appearing one-by-one for streaming effect ### Key Files - **app.py**: Flask server with SocketIO WebSocket handlers and REST API for settings - **transcriber.py**: Whisper model loading and audio transcription (singleton model instance) - **database.py**: SQLite CRUD for user display preferences - **static/js/app.js**: Audio capture, WebSocket client, word animation queue - **static/js/settings.js**: Settings panel UI and persistence ## Configuration Environment variables in `.env`: - `WHISPER_MODEL`: Model size (tiny/base/small/medium/large) - affects accuracy vs speed - `WHISPER_DEVICE`: cpu or cuda - `WHISPER_COMPUTE_TYPE`: int8/float16/float32 User display settings stored in SQLite (`data/settings.db`): - Font family, size, weight, color - Background color, opacity, border radius, padding - Max words (controls caption buffer length) ## API Endpoints | Endpoint | Method | Purpose | |----------|--------|---------| | `/` | GET | Main UI | | `/api/health` | GET | Health check | | `/api/settings` | GET/PUT | Read/update user settings | | `/api/settings/reset` | POST | Reset to defaults | ## WebSocket Events | Event | Direction | Payload | |-------|-----------|---------| | `audio_data` | client→server | `{audio: base64, format: 'webm'}` | | `transcription` | server→client | `{text: string}` | | `settings_updated` | server→client | settings object | ## Volumes - `./data:/app/data` - SQLite database persistence - `whisper-models` - Cached Whisper model files (~140MB for base) ## NVIDIA GPU Support GPU acceleration significantly improves transcription speed. Follow these steps to enable it. ### Prerequisites 1. NVIDIA GPU with CUDA support 2. NVIDIA driver installed (`nvidia-smi` should work) 3. Docker installed ### Install NVIDIA Container Toolkit ```bash # Add NVIDIA package repository curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list # Install the toolkit sudo apt-get update sudo apt-get install -y nvidia-container-toolkit # Configure Docker to use NVIDIA runtime sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker # Verify installation docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi ``` ### Configure for GPU 1. Update `.env`: ```env WHISPER_DEVICE=cuda WHISPER_COMPUTE_TYPE=float16 ``` 2. Run with GPU support: ```bash docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build ``` ### GPU Compute Types | Type | Speed | Memory | Notes | |------|-------|--------|-------| | `float16` | Fast | Medium | Recommended for most GPUs | | `int8_float16` | Faster | Lower | Good balance | | `float32` | Slower | Higher | Maximum precision | ### Troubleshooting - **"could not select device driver"**: NVIDIA Container Toolkit not installed or Docker not restarted - **CUDA out of memory**: Try a smaller model (`WHISPER_MODEL=small` or `tiny`) - **Verify GPU access**: `docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi`