Added README
This commit is contained in:
parent
c7becf330c
commit
ba069c4ded
218
README.MD
218
README.MD
@ -1,3 +1,217 @@
|
||||
Live Captions
|
||||
# Live Captions
|
||||
|
||||
Live captions is a project to display live captions on screen in a small customizable browser window entirely locally.
|
||||
Real-time speech-to-text captions displayed in a customizable browser window, running entirely locally using OpenAI's Whisper model.
|
||||
|
||||
## Features
|
||||
|
||||
- **Local Processing**: All transcription happens on your machine - no data sent to external services
|
||||
- **Real-time Captions**: Audio captured and transcribed in small chunks for near-instant feedback
|
||||
- **Customizable Display**: Adjust font, colors, size, background opacity, and more
|
||||
- **Recording Support**: Save caption sessions as markdown files
|
||||
- **GPU Acceleration**: Optional NVIDIA GPU support for faster transcription
|
||||
- **Docker-based**: Easy deployment with minimal setup
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Docker and Docker Compose installed
|
||||
- Microphone access in browser
|
||||
|
||||
### Installation
|
||||
|
||||
1. Clone the repository:
|
||||
```bash
|
||||
git clone <repository-url>
|
||||
cd live-captions
|
||||
```
|
||||
|
||||
2. Create your environment file:
|
||||
```bash
|
||||
cp .env.example .env
|
||||
```
|
||||
|
||||
3. Build and run:
|
||||
```bash
|
||||
docker compose up --build
|
||||
```
|
||||
|
||||
4. Open http://localhost:5000 in your browser
|
||||
|
||||
5. Click "Start" and allow microphone access
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Edit `.env` to customize:
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `WHISPER_MODEL` | `base` | Model size: `tiny`, `base`, `small`, `medium`, `large` |
|
||||
| `WHISPER_DEVICE` | `cpu` | Processing device: `cpu` or `cuda` |
|
||||
| `WHISPER_COMPUTE_TYPE` | `int8` | Precision: `int8`, `float16`, `float32` |
|
||||
| `PORT` | `5000` | Server port |
|
||||
| `AUDIO_CHUNK_DURATION` | `3` | Seconds of audio per chunk |
|
||||
|
||||
### Model Sizes
|
||||
|
||||
| Model | Size | Speed | Accuracy | RAM Required |
|
||||
|-------|------|-------|----------|--------------|
|
||||
| `tiny` | 39M | Fastest | Lower | ~1GB |
|
||||
| `base` | 74M | Fast | Good | ~1GB |
|
||||
| `small` | 244M | Medium | Better | ~2GB |
|
||||
| `medium` | 769M | Slower | High | ~5GB |
|
||||
| `large` | 1550M | Slowest | Highest | ~10GB |
|
||||
|
||||
### Display Settings
|
||||
|
||||
Access the settings panel in the web UI to customize:
|
||||
- Font family, size, and weight
|
||||
- Text and background colors
|
||||
- Background opacity and border radius
|
||||
- Maximum words displayed
|
||||
|
||||
Settings persist in a local SQLite database.
|
||||
|
||||
## Docker Commands
|
||||
|
||||
```bash
|
||||
# Build and run
|
||||
docker compose up --build
|
||||
|
||||
# Run in background
|
||||
docker compose up -d --build
|
||||
|
||||
# View logs
|
||||
docker compose logs -f
|
||||
|
||||
# Stop
|
||||
docker compose down
|
||||
|
||||
# Reset all data (database + cached models)
|
||||
docker compose down -v
|
||||
```
|
||||
|
||||
## NVIDIA GPU Support
|
||||
|
||||
GPU acceleration significantly improves transcription speed (3-10x faster than CPU).
|
||||
|
||||
### Prerequisites
|
||||
|
||||
1. NVIDIA GPU with CUDA support
|
||||
2. NVIDIA driver installed (verify with `nvidia-smi`)
|
||||
3. Docker installed
|
||||
|
||||
### Install NVIDIA Container Toolkit
|
||||
|
||||
```bash
|
||||
# Add NVIDIA package repository
|
||||
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
|
||||
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
|
||||
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
|
||||
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
|
||||
|
||||
# Install the toolkit
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y nvidia-container-toolkit
|
||||
|
||||
# Configure Docker to use NVIDIA runtime
|
||||
sudo nvidia-ctk runtime configure --runtime=docker
|
||||
sudo systemctl restart docker
|
||||
|
||||
# Verify installation
|
||||
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
|
||||
```
|
||||
|
||||
### Enable GPU Mode
|
||||
|
||||
1. Update `.env`:
|
||||
```env
|
||||
WHISPER_DEVICE=cuda
|
||||
WHISPER_COMPUTE_TYPE=float16
|
||||
```
|
||||
|
||||
2. Run with GPU compose file:
|
||||
```bash
|
||||
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build
|
||||
```
|
||||
|
||||
### GPU Compute Types
|
||||
|
||||
| Type | Speed | Memory | Notes |
|
||||
|------|-------|--------|-------|
|
||||
| `float16` | Fast | Medium | Recommended for most GPUs |
|
||||
| `int8_float16` | Faster | Lower | Good balance of speed/memory |
|
||||
| `float32` | Slower | Higher | Maximum precision |
|
||||
|
||||
### GPU Troubleshooting
|
||||
|
||||
- **"could not select device driver"**: NVIDIA Container Toolkit not installed or Docker not restarted
|
||||
- **CUDA out of memory**: Try a smaller model (`WHISPER_MODEL=small` or `tiny`)
|
||||
- **Verify GPU access**:
|
||||
```bash
|
||||
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Browser Docker Container
|
||||
┌─────────────────────┐ ┌─────────────────────────────┐
|
||||
│ MediaRecorder API │ │ Flask + Flask-SocketIO │
|
||||
│ (audio chunks) │ ──────► │ (app.py) │
|
||||
│ │ WebSocket│ │ │
|
||||
│ Caption Display │ ◄────── │ faster-whisper transcriber │
|
||||
│ (word-by-word) │ │ (transcriber.py) │
|
||||
│ │ │ │ │
|
||||
│ Settings Panel │ ──────► │ SQLite settings persistence│
|
||||
│ │ REST API│ (database.py) │
|
||||
└─────────────────────┘ └─────────────────────────────┘
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
1. Browser captures microphone audio using MediaRecorder API
|
||||
2. Audio sent as base64-encoded WebM chunks via WebSocket
|
||||
3. Backend converts WebM to WAV using pydub/ffmpeg
|
||||
4. faster-whisper transcribes audio to text
|
||||
5. Text sent back via WebSocket
|
||||
6. Frontend displays words with animation effect
|
||||
|
||||
## API Reference
|
||||
|
||||
### REST Endpoints
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/` | GET | Main UI |
|
||||
| `/api/health` | GET | Health check |
|
||||
| `/api/settings` | GET | Get current settings |
|
||||
| `/api/settings` | PUT | Update settings |
|
||||
| `/api/settings/reset` | POST | Reset to defaults |
|
||||
| `/api/recordings` | GET | List saved recordings |
|
||||
| `/api/recordings/<filename>` | GET | Get recording content |
|
||||
| `/api/recordings/<filename>` | DELETE | Delete recording |
|
||||
|
||||
### WebSocket Events
|
||||
|
||||
| Event | Direction | Payload |
|
||||
|-------|-----------|---------|
|
||||
| `audio_data` | client → server | `{audio: base64, format: 'webm'}` |
|
||||
| `transcription` | server → client | `{text: string}` |
|
||||
| `settings_updated` | server → client | settings object |
|
||||
| `start_recording` | client → server | - |
|
||||
| `stop_recording` | client → server | - |
|
||||
|
||||
## Data Persistence
|
||||
|
||||
| Location | Content |
|
||||
|----------|---------|
|
||||
| `./data/` | SQLite database for settings |
|
||||
| `./recordings/` | Saved caption sessions (markdown) |
|
||||
| `whisper-models` volume | Cached Whisper model files |
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user