browser-captions/CLAUDE.md

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

Live Captions is a Dockerized web application that provides real-time speech-to-text captions using OpenAI's Whisper model (via faster-whisper). It captures microphone audio in the browser, streams it to a Flask backend for transcription, and displays captions with customizable styling.

## Commands

### Development
```bash
# Build and run (primary development command)
docker compose up --build

# Run in background
docker compose up -d --build

# View logs
docker compose logs -f

# Stop
docker compose down

# Reset all data (database + cached models)
docker compose down -v
```

### First-time setup
```bash
cp .env.example .env
docker compose up --build
```

## Architecture

```
Browser                          Docker Container
┌─────────────────────┐         ┌─────────────────────────────┐
│  MediaRecorder API  │         │  Flask + Flask-SocketIO     │
│  (1.5s audio chunks)│ ──────► │         (app.py)            │
│                     │ WebSocket│            │                │
│  Caption Display    │ ◄────── │  faster-whisper transcriber │
│  (word-by-word)     │         │      (transcriber.py)       │
│                     │         │            │                │
│  Settings Panel     │ ──────► │  SQLite settings persistence│
│                     │ REST API│      (database.py)          │
└─────────────────────┘         └─────────────────────────────┘
```

### Data Flow
1. Browser captures mic audio using MediaRecorder, sends base64-encoded WebM chunks every 1.5s via WebSocket
2. Backend converts WebM→WAV using pydub/ffmpeg, transcribes with faster-whisper
3. Transcribed text sent back via WebSocket `transcription` event
4. Frontend animates words appearing one-by-one for streaming effect

### Key Files
- **app.py**: Flask server with SocketIO WebSocket handlers and REST API for settings
- **transcriber.py**: Whisper model loading and audio transcription (singleton model instance)
- **database.py**: SQLite CRUD for user display preferences
- **static/js/app.js**: Audio capture, WebSocket client, word animation queue
- **static/js/settings.js**: Settings panel UI and persistence

## Configuration

Environment variables in `.env`:
- `WHISPER_MODEL`: Model size (tiny/base/small/medium/large) - affects accuracy vs speed
- `WHISPER_DEVICE`: cpu or cuda
- `WHISPER_COMPUTE_TYPE`: int8/float16/float32

User display settings stored in SQLite (`data/settings.db`):
- Font family, size, weight, color
- Background color, opacity, border radius, padding
- Max words (controls caption buffer length)

## API Endpoints

| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/` | GET | Main UI |
| `/api/health` | GET | Health check |
| `/api/settings` | GET/PUT | Read/update user settings |
| `/api/settings/reset` | POST | Reset to defaults |

## WebSocket Events

| Event | Direction | Payload |
|-------|-----------|---------|
| `audio_data` | client→server | `{audio: base64, format: 'webm'}` |
| `transcription` | server→client | `{text: string}` |
| `settings_updated` | server→client | settings object |

## Volumes

- `./data:/app/data` - SQLite database persistence
- `whisper-models` - Cached Whisper model files (~140MB for base)

## NVIDIA GPU Support

GPU acceleration significantly improves transcription speed. Follow these steps to enable it.

### Prerequisites

1. NVIDIA GPU with CUDA support
2. NVIDIA driver installed (`nvidia-smi` should work)
3. Docker installed

### Install NVIDIA Container Toolkit

```bash
# Add NVIDIA package repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Install the toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Configure Docker to use NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify installation
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
```

### Configure for GPU

1. Update `.env`:
```env
WHISPER_DEVICE=cuda
WHISPER_COMPUTE_TYPE=float16
```

2. Run with GPU support:
```bash
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build
```

### GPU Compute Types

| Type | Speed | Memory | Notes |
|------|-------|--------|-------|
| `float16` | Fast | Medium | Recommended for most GPUs |
| `int8_float16` | Faster | Lower | Good balance |
| `float32` | Slower | Higher | Maximum precision |

### Troubleshooting

- **"could not select device driver"**: NVIDIA Container Toolkit not installed or Docker not restarted
- **CUDA out of memory**: Try a smaller model (`WHISPER_MODEL=small` or `tiny`)
- **Verify GPU access**: `docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi`