browser-captions/README.MD

# Live Captions

Real-time speech-to-text captions displayed in a customizable browser window, running entirely locally using OpenAI's Whisper model.

## Features

- **Local Processing**: All transcription happens on your machine - no data sent to external services
- **Real-time Captions**: Audio captured and transcribed in small chunks for near-instant feedback
- **Customizable Display**: Adjust font, colors, size, background opacity, and more
- **Recording Support**: Save caption sessions as markdown files
- **GPU Acceleration**: Optional NVIDIA GPU support for faster transcription
- **Docker-based**: Easy deployment with minimal setup

## Quick Start

### Prerequisites

- Docker and Docker Compose installed
- Nvidia Docker Toolkit installed
- Microphone access in browser

### Installation

1. Clone the repository:
   ```bash
   git clone <repository-url>
   cd live-captions
   ```

2. Create your environment file:
   ```bash
   cp .env.example .env
   ```

3. Build and run:
   ```bash
   docker compose up --build
   ```

4. Open http://localhost:5000 in your browser

5. Click "Start" and allow microphone access

## Configuration

### Environment Variables

Edit `.env` to customize:

| Variable | Default | Description |
|----------|---------|-------------|
| `WHISPER_MODEL` | `base` | Model size: `tiny`, `base`, `small`, `medium`, `large` |
| `WHISPER_DEVICE` | `cpu` | Processing device: `cpu` or `cuda` |
| `WHISPER_COMPUTE_TYPE` | `int8` | Precision: `int8`, `float16`, `float32` |
| `PORT` | `5000` | Server port |
| `AUDIO_CHUNK_DURATION` | `3` | Seconds of audio per chunk |

### Model Sizes

| Model | Size | Speed | Accuracy | RAM Required |
|-------|------|-------|----------|--------------|
| `tiny` | 39M | Fastest | Lower | ~1GB |
| `base` | 74M | Fast | Good | ~1GB |
| `small` | 244M | Medium | Better | ~2GB |
| `medium` | 769M | Slower | High | ~5GB |
| `large` | 1550M | Slowest | Highest | ~10GB |

### Display Settings

Access the settings panel in the web UI to customize:
- Font family, size, and weight
- Text and background colors
- Background opacity and border radius
- Maximum words displayed

Settings persist in a local SQLite database.

## Docker Commands

```bash
# Build and run
docker compose up --build

# Run in background
docker compose up -d --build

# View logs
docker compose logs -f

# Stop
docker compose down

# Reset all data (database + cached models)
docker compose down -v
```

## NVIDIA GPU Support

GPU acceleration significantly improves transcription speed (3-10x faster than CPU).

### Prerequisites

1. NVIDIA GPU with CUDA support
2. NVIDIA driver installed (verify with `nvidia-smi`)
3. Docker installed

### Install NVIDIA Container Toolkit

```bash
# Add NVIDIA package repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Install the toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Configure Docker to use NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify installation
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
```

### Enable GPU Mode

1. Update `.env`:
   ```env
   WHISPER_DEVICE=cuda
   WHISPER_COMPUTE_TYPE=float16
   ```

2. Run with GPU compose file:
   ```bash
   docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build
   ```

### GPU Compute Types

| Type | Speed | Memory | Notes |
|------|-------|--------|-------|
| `float16` | Fast | Medium | Recommended for most GPUs |
| `int8_float16` | Faster | Lower | Good balance of speed/memory |
| `float32` | Slower | Higher | Maximum precision |

### GPU Troubleshooting

- **"could not select device driver"**: NVIDIA Container Toolkit not installed or Docker not restarted
- **CUDA out of memory**: Try a smaller model (`WHISPER_MODEL=small` or `tiny`)
- **Verify GPU access**:
  ```bash
  docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
  ```

## Architecture

```
Browser                          Docker Container
┌─────────────────────┐         ┌─────────────────────────────┐
│  MediaRecorder API  │         │  Flask + Flask-SocketIO     │
│  (audio chunks)     │ ──────► │         (app.py)            │
│                     │ WebSocket│            │                │
│  Caption Display    │ ◄────── │  faster-whisper transcriber │
│  (word-by-word)     │         │      (transcriber.py)       │
│                     │         │            │                │
│  Settings Panel     │ ──────► │  SQLite settings persistence│
│                     │ REST API│      (database.py)          │
└─────────────────────┘         └─────────────────────────────┘
```

### Data Flow

1. Browser captures microphone audio using MediaRecorder API
2. Audio sent as base64-encoded WebM chunks via WebSocket
3. Backend converts WebM to WAV using pydub/ffmpeg
4. faster-whisper transcribes audio to text
5. Text sent back via WebSocket
6. Frontend displays words with animation effect

## API Reference

### REST Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/` | GET | Main UI |
| `/api/health` | GET | Health check |
| `/api/settings` | GET | Get current settings |
| `/api/settings` | PUT | Update settings |
| `/api/settings/reset` | POST | Reset to defaults |
| `/api/recordings` | GET | List saved recordings |
| `/api/recordings/<filename>` | GET | Get recording content |
| `/api/recordings/<filename>` | DELETE | Delete recording |

### WebSocket Events

| Event | Direction | Payload |
|-------|-----------|---------|
| `audio_data` | client → server | `{audio: base64, format: 'webm'}` |
| `transcription` | server → client | `{text: string}` |
| `settings_updated` | server → client | settings object |
| `start_recording` | client → server | - |
| `stop_recording` | client → server | - |

## Data Persistence

| Location | Content |
|----------|---------|
| `./data/` | SQLite database for settings |
| `./recordings/` | Saved caption sessions (markdown) |
| `whisper-models` volume | Cached Whisper model files |

## License

MIT