Initial commit: Live Captions web application

Real-time speech-to-text using OpenAI Whisper (faster-whisper).
Features browser audio capture, WebSocket streaming, and customizable display settings.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
bunker-admin 2026-01-12 08:53:40 -07:00
commit c7becf330c
18 changed files with 2633 additions and 0 deletions

33
.env.example Normal file
View File

@ -0,0 +1,33 @@
# Server settings
HOST=0.0.0.0
PORT=5000
DEBUG=false
# Whisper settings
WHISPER_MODEL=base
# Device: cpu or cuda (for NVIDIA GPU)
WHISPER_DEVICE=cpu
# Compute type:
# CPU: int8 (fastest), float32
# GPU: float16 (recommended), int8_float16, float32
WHISPER_COMPUTE_TYPE=int8
# Audio settings
AUDIO_CHUNK_DURATION=3
AUDIO_SAMPLE_RATE=16000
# Database
DATABASE_PATH=data/settings.db
# =============================================================================
# GPU Configuration (optional)
# =============================================================================
# To enable NVIDIA GPU support:
# 1. Install NVIDIA Container Toolkit (see CLAUDE.md for instructions)
# 2. Set WHISPER_DEVICE=cuda
# 3. Set WHISPER_COMPUTE_TYPE=float16 (recommended for GPU)
# 4. Run with: docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build
#
# Example GPU settings:
# WHISPER_DEVICE=cuda
# WHISPER_COMPUTE_TYPE=float16

28
.gitignore vendored Normal file
View File

@ -0,0 +1,28 @@
# Environment
.env
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
venv/
ENV/
# Data
data/
recordings/
# IDE
.vscode/
.idea/
*.swp
*.swo
# OS
.DS_Store
Thumbs.db
# Whisper models cache (if running locally)
.cache/

154
CLAUDE.md Normal file
View File

@ -0,0 +1,154 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Live Captions is a Dockerized web application that provides real-time speech-to-text captions using OpenAI's Whisper model (via faster-whisper). It captures microphone audio in the browser, streams it to a Flask backend for transcription, and displays captions with customizable styling.
## Commands
### Development
```bash
# Build and run (primary development command)
docker compose up --build
# Run in background
docker compose up -d --build
# View logs
docker compose logs -f
# Stop
docker compose down
# Reset all data (database + cached models)
docker compose down -v
```
### First-time setup
```bash
cp .env.example .env
docker compose up --build
```
## Architecture
```
Browser Docker Container
┌─────────────────────┐ ┌─────────────────────────────┐
│ MediaRecorder API │ │ Flask + Flask-SocketIO │
│ (1.5s audio chunks)│ ──────► │ (app.py) │
│ │ WebSocket│ │ │
│ Caption Display │ ◄────── │ faster-whisper transcriber │
│ (word-by-word) │ │ (transcriber.py) │
│ │ │ │ │
│ Settings Panel │ ──────► │ SQLite settings persistence│
│ │ REST API│ (database.py) │
└─────────────────────┘ └─────────────────────────────┘
```
### Data Flow
1. Browser captures mic audio using MediaRecorder, sends base64-encoded WebM chunks every 1.5s via WebSocket
2. Backend converts WebM→WAV using pydub/ffmpeg, transcribes with faster-whisper
3. Transcribed text sent back via WebSocket `transcription` event
4. Frontend animates words appearing one-by-one for streaming effect
### Key Files
- **app.py**: Flask server with SocketIO WebSocket handlers and REST API for settings
- **transcriber.py**: Whisper model loading and audio transcription (singleton model instance)
- **database.py**: SQLite CRUD for user display preferences
- **static/js/app.js**: Audio capture, WebSocket client, word animation queue
- **static/js/settings.js**: Settings panel UI and persistence
## Configuration
Environment variables in `.env`:
- `WHISPER_MODEL`: Model size (tiny/base/small/medium/large) - affects accuracy vs speed
- `WHISPER_DEVICE`: cpu or cuda
- `WHISPER_COMPUTE_TYPE`: int8/float16/float32
User display settings stored in SQLite (`data/settings.db`):
- Font family, size, weight, color
- Background color, opacity, border radius, padding
- Max words (controls caption buffer length)
## API Endpoints
| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/` | GET | Main UI |
| `/api/health` | GET | Health check |
| `/api/settings` | GET/PUT | Read/update user settings |
| `/api/settings/reset` | POST | Reset to defaults |
## WebSocket Events
| Event | Direction | Payload |
|-------|-----------|---------|
| `audio_data` | client→server | `{audio: base64, format: 'webm'}` |
| `transcription` | server→client | `{text: string}` |
| `settings_updated` | server→client | settings object |
## Volumes
- `./data:/app/data` - SQLite database persistence
- `whisper-models` - Cached Whisper model files (~140MB for base)
## NVIDIA GPU Support
GPU acceleration significantly improves transcription speed. Follow these steps to enable it.
### Prerequisites
1. NVIDIA GPU with CUDA support
2. NVIDIA driver installed (`nvidia-smi` should work)
3. Docker installed
### Install NVIDIA Container Toolkit
```bash
# Add NVIDIA package repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Install the toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Configure Docker to use NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Verify installation
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
```
### Configure for GPU
1. Update `.env`:
```env
WHISPER_DEVICE=cuda
WHISPER_COMPUTE_TYPE=float16
```
2. Run with GPU support:
```bash
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build
```
### GPU Compute Types
| Type | Speed | Memory | Notes |
|------|-------|--------|-------|
| `float16` | Fast | Medium | Recommended for most GPUs |
| `int8_float16` | Faster | Lower | Good balance |
| `float32` | Slower | Higher | Maximum precision |
### Troubleshooting
- **"could not select device driver"**: NVIDIA Container Toolkit not installed or Docker not restarted
- **CUDA out of memory**: Try a smaller model (`WHISPER_MODEL=small` or `tiny`)
- **Verify GPU access**: `docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi`

45
Dockerfile Normal file
View File

@ -0,0 +1,45 @@
FROM python:3.11-slim
# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
libsndfile1 \
&& rm -rf /var/lib/apt/lists/*
# Create app directory
WORKDIR /app
# Create non-root user
RUN useradd -m -u 1000 appuser
# Copy requirements first for better caching
COPY requirements.txt .
# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Create data and recordings directories
RUN mkdir -p /app/data /app/recordings && chown -R appuser:appuser /app
# Create directory for Whisper models cache
RUN mkdir -p /home/appuser/.cache/huggingface && chown -R appuser:appuser /home/appuser
# Switch to non-root user
USER appuser
# Expose port
EXPOSE 5000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:5000/api/health')" || exit 1
# Run the application
CMD ["python", "app.py"]

54
Dockerfile.gpu Normal file
View File

@ -0,0 +1,54 @@
# GPU-enabled Dockerfile for NVIDIA CUDA support
FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV DEBIAN_FRONTEND=noninteractive
# Install Python and system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
python3.11 \
python3.11-venv \
python3-pip \
ffmpeg \
libsndfile1 \
&& rm -rf /var/lib/apt/lists/*
# Set Python 3.11 as default
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1 \
&& update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1
# Create app directory
WORKDIR /app
# Create non-root user
RUN useradd -m -u 1000 appuser
# Copy requirements first for better caching
COPY requirements.txt .
# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Create data and recordings directories
RUN mkdir -p /app/data /app/recordings && chown -R appuser:appuser /app
# Create directory for Whisper models cache
RUN mkdir -p /home/appuser/.cache/huggingface && chown -R appuser:appuser /home/appuser
# Switch to non-root user
USER appuser
# Expose port
EXPOSE 5000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:5000/api/health')" || exit 1
# Run the application
CMD ["python", "app.py"]

3
README.MD Normal file
View File

@ -0,0 +1,3 @@
Live Captions
Live captions is a project to display live captions on screen in a small customizable browser window entirely locally.

235
app.py Normal file
View File

@ -0,0 +1,235 @@
"""
Live Captions - Flask Application
A web-based live captioning application using Whisper for speech recognition.
"""
import os
import logging
from datetime import datetime
from flask import Flask, render_template, jsonify, request
from flask_socketio import SocketIO, emit
from dotenv import load_dotenv
import database
import transcriber
import recordings
# Load environment variables
load_dotenv()
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
# Initialize Flask app
app = Flask(__name__)
app.config['SECRET_KEY'] = os.environ.get('SECRET_KEY', 'live-captions-secret')
# Initialize SocketIO with gevent
socketio = SocketIO(
app,
cors_allowed_origins="*",
async_mode='gevent'
)
# =============================================================================
# Routes
# =============================================================================
@app.route('/')
def index():
"""Serve the main page."""
return render_template('index.html')
@app.route('/api/health')
def health():
"""Health check endpoint."""
return jsonify({'status': 'healthy'})
@app.route('/api/settings', methods=['GET'])
def get_settings():
"""Get current user settings."""
settings = database.get_settings()
return jsonify(settings)
@app.route('/api/settings', methods=['PUT'])
def update_settings():
"""Update user settings."""
data = request.get_json()
if not data:
return jsonify({'error': 'No data provided'}), 400
settings = database.update_settings(data)
# Broadcast settings update to all clients
socketio.emit('settings_updated', settings)
return jsonify(settings)
@app.route('/api/settings/reset', methods=['POST'])
def reset_settings():
"""Reset settings to defaults."""
settings = database.reset_settings()
# Broadcast settings update to all clients
socketio.emit('settings_updated', settings)
return jsonify(settings)
@app.route('/api/recordings', methods=['GET'])
def list_recordings():
"""List all saved recordings."""
return jsonify(recordings.list_recordings())
@app.route('/api/recordings/<filename>', methods=['GET'])
def get_recording(filename):
"""Get a specific recording's content."""
recording = recordings.get_recording(filename)
if recording:
return jsonify(recording)
return jsonify({'error': 'Recording not found'}), 404
@app.route('/api/recordings/<filename>', methods=['DELETE'])
def delete_recording(filename):
"""Delete a specific recording."""
if recordings.delete_recording(filename):
return jsonify({'success': True})
return jsonify({'error': 'Failed to delete recording'}), 400
# =============================================================================
# WebSocket Events
# =============================================================================
@socketio.on('connect')
def handle_connect():
"""Handle client connection."""
logger.info(f"Client connected: {request.sid}")
# Send current settings to the newly connected client
settings = database.get_settings()
emit('settings_updated', settings)
@socketio.on('disconnect')
def handle_disconnect():
"""Handle client disconnection."""
logger.info(f"Client disconnected: {request.sid}")
@socketio.on('audio_data')
def handle_audio_data(data):
"""
Handle incoming audio data from client.
Args:
data: Dictionary containing 'audio' (base64 or bytes) and 'format'
"""
try:
audio_bytes = data.get('audio')
audio_format = data.get('format', 'webm')
if not audio_bytes:
return
# Handle base64 encoded audio
if isinstance(audio_bytes, str):
import base64
audio_bytes = base64.b64decode(audio_bytes)
# Transcribe audio
text = transcriber.transcribe_audio(audio_bytes, format=audio_format)
if text:
logger.info(f"Transcription: {text}")
emit('transcription', {'text': text})
except Exception as e:
logger.error(f"Error processing audio: {e}")
emit('error', {'message': 'Failed to process audio'})
@socketio.on('save_recording')
def handle_save_recording(data):
"""Handle saving a recording session."""
client_id = request.sid
try:
# Parse timestamps from client
start_time_str = data.get('startTime')
end_time_str = data.get('endTime')
if start_time_str:
start_time = datetime.fromisoformat(start_time_str.replace('Z', '+00:00'))
else:
start_time = datetime.now()
if end_time_str:
end_time = datetime.fromisoformat(end_time_str.replace('Z', '+00:00'))
else:
end_time = datetime.now()
transcript = data.get('transcript', '')
word_count = data.get('wordCount', 0)
# Save the recording
filename = recordings.save_recording(
start_time=start_time,
end_time=end_time,
transcript=transcript,
word_count=word_count,
client_id=client_id
)
if filename:
logger.info(f"Recording saved: {filename}")
emit('recording_saved', {'filename': filename})
else:
emit('recording_error', {'message': 'Failed to save recording'})
except Exception as e:
logger.error(f"Error saving recording: {e}")
emit('recording_error', {'message': str(e)})
# =============================================================================
# Startup
# =============================================================================
def initialize():
"""Initialize application components."""
logger.info("Initializing Live Captions...")
# Initialize database
database.init_db()
logger.info("Database initialized")
# Preload Whisper model
logger.info("Preloading Whisper model (this may take a moment)...")
if transcriber.preload_model():
logger.info("Whisper model ready")
else:
logger.warning("Failed to preload Whisper model")
if __name__ == '__main__':
initialize()
host = os.environ.get('HOST', '0.0.0.0')
port = int(os.environ.get('PORT', 5000))
debug = os.environ.get('DEBUG', 'false').lower() == 'true'
logger.info(f"Starting Live Captions on {host}:{port}")
socketio.run(app, host=host, port=port, debug=debug)

168
database.py Normal file
View File

@ -0,0 +1,168 @@
"""
SQLite database module for user settings persistence.
"""
import sqlite3
import os
from datetime import datetime
# Default settings
DEFAULT_SETTINGS = {
'font_family': 'Arial, sans-serif',
'font_size': 32,
'font_weight': 'normal',
'text_color': '#ffffff',
'background_color': '#1a1a2e',
'background_opacity': 0.9,
'max_words': 30,
'text_align': 'center',
'padding': 20,
'border_radius': 10,
}
def get_db_path():
"""Get database path from environment or use default."""
return os.environ.get('DATABASE_PATH', 'data/settings.db')
def get_connection():
"""Create a database connection."""
db_path = get_db_path()
# Ensure directory exists
os.makedirs(os.path.dirname(db_path), exist_ok=True)
conn = sqlite3.connect(db_path)
conn.row_factory = sqlite3.Row
return conn
def init_db():
"""Initialize the database with the settings table."""
conn = get_connection()
cursor = conn.cursor()
# Check if table exists
cursor.execute("SELECT name FROM sqlite_master WHERE type='table' AND name='user_settings'")
table_exists = cursor.fetchone() is not None
if table_exists:
# Check if we need to migrate from max_lines to max_words
cursor.execute("PRAGMA table_info(user_settings)")
columns = [col[1] for col in cursor.fetchall()]
if 'max_lines' in columns and 'max_words' not in columns:
# Add max_words column
cursor.execute('ALTER TABLE user_settings ADD COLUMN max_words INTEGER DEFAULT 30')
conn.commit()
# Remove old columns that are no longer needed (fade_delay, max_lines)
# SQLite doesn't support DROP COLUMN easily, so we just ignore old columns
else:
# Create settings table
cursor.execute('''
CREATE TABLE IF NOT EXISTS user_settings (
id INTEGER PRIMARY KEY DEFAULT 1,
font_family TEXT DEFAULT 'Arial, sans-serif',
font_size INTEGER DEFAULT 32,
font_weight TEXT DEFAULT 'normal',
text_color TEXT DEFAULT '#ffffff',
background_color TEXT DEFAULT '#1a1a2e',
background_opacity REAL DEFAULT 0.9,
max_words INTEGER DEFAULT 30,
text_align TEXT DEFAULT 'center',
padding INTEGER DEFAULT 20,
border_radius INTEGER DEFAULT 10,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
''')
# Insert default settings if table is empty
cursor.execute('SELECT COUNT(*) FROM user_settings')
if cursor.fetchone()[0] == 0:
columns = ', '.join(DEFAULT_SETTINGS.keys())
placeholders = ', '.join(['?' for _ in DEFAULT_SETTINGS])
cursor.execute(
f'INSERT INTO user_settings ({columns}) VALUES ({placeholders})',
list(DEFAULT_SETTINGS.values())
)
conn.commit()
conn.close()
def get_settings():
"""Fetch current user settings."""
conn = get_connection()
cursor = conn.cursor()
cursor.execute('SELECT * FROM user_settings WHERE id = 1')
row = cursor.fetchone()
conn.close()
if row:
# Convert to dict and exclude id and timestamps
settings = dict(row)
for key in ['id', 'created_at', 'updated_at', 'max_lines', 'fade_delay']:
settings.pop(key, None)
# Ensure max_words exists (for migration)
if 'max_words' not in settings:
settings['max_words'] = DEFAULT_SETTINGS['max_words']
return settings
return DEFAULT_SETTINGS.copy()
def update_settings(settings_dict):
"""Update user settings with provided values."""
if not settings_dict:
return get_settings()
conn = get_connection()
cursor = conn.cursor()
# Build UPDATE query with only valid columns
valid_columns = set(DEFAULT_SETTINGS.keys())
updates = []
values = []
for key, value in settings_dict.items():
if key in valid_columns:
updates.append(f'{key} = ?')
values.append(value)
if updates:
updates.append('updated_at = ?')
values.append(datetime.now().isoformat())
query = f'UPDATE user_settings SET {", ".join(updates)} WHERE id = 1'
cursor.execute(query, values)
conn.commit()
conn.close()
return get_settings()
def reset_settings():
"""Reset all settings to defaults."""
conn = get_connection()
cursor = conn.cursor()
# Delete existing and insert defaults
cursor.execute('DELETE FROM user_settings')
columns = ', '.join(DEFAULT_SETTINGS.keys())
placeholders = ', '.join(['?' for _ in DEFAULT_SETTINGS])
cursor.execute(
f'INSERT INTO user_settings ({columns}) VALUES ({placeholders})',
list(DEFAULT_SETTINGS.values())
)
conn.commit()
conn.close()
return DEFAULT_SETTINGS.copy()

21
docker-compose.gpu.yml Normal file
View File

@ -0,0 +1,21 @@
# GPU override for docker-compose
# Usage: docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build
#
# Prerequisites:
# 1. NVIDIA GPU with driver installed
# 2. NVIDIA Container Toolkit installed
# 3. Set WHISPER_DEVICE=cuda in .env
# 4. Set WHISPER_COMPUTE_TYPE=float16 in .env (recommended)
services:
live-captions:
build:
context: .
dockerfile: Dockerfile.gpu
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]

29
docker-compose.yml Normal file
View File

@ -0,0 +1,29 @@
services:
live-captions:
build: .
container_name: live-captions
ports:
- "${PORT:-5000}:5000"
volumes:
# Persist SQLite database
- ./data:/app/data
# Persist Whisper models
- whisper-models:/home/appuser/.cache/huggingface
# Persist recordings
- ./recordings:/app/recordings
env_file:
- .env
environment:
- HOST=0.0.0.0
- PORT=5000
restart: unless-stopped
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:5000/api/health')"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
volumes:
whisper-models:
name: live-captions-whisper-models

208
recordings.py Normal file
View File

@ -0,0 +1,208 @@
"""
Recording session management and file saving.
"""
import os
import logging
from datetime import datetime
from typing import Optional
logger = logging.getLogger(__name__)
# Default recordings directory
RECORDINGS_DIR = os.environ.get('RECORDINGS_PATH', '/app/recordings')
def ensure_recordings_dir():
"""Ensure the recordings directory exists."""
os.makedirs(RECORDINGS_DIR, exist_ok=True)
return RECORDINGS_DIR
def generate_filename(start_time: datetime) -> str:
"""
Generate a filename from the session start time.
Format: YYYY-MM-DD_HH-MM-SS_captions.md
"""
return start_time.strftime('%Y-%m-%d_%H-%M-%S_captions.md')
def calculate_duration(start_time: datetime, end_time: datetime) -> str:
"""Calculate and format duration as HH:MM:SS."""
delta = end_time - start_time
total_seconds = int(delta.total_seconds())
hours, remainder = divmod(total_seconds, 3600)
minutes, seconds = divmod(remainder, 60)
return f"{hours:02d}:{minutes:02d}:{seconds:02d}"
def get_whisper_model_name() -> str:
"""Get the configured Whisper model name."""
return os.environ.get('WHISPER_MODEL', 'base')
def save_recording(
start_time: datetime,
end_time: datetime,
transcript: str,
word_count: int,
client_id: str
) -> Optional[str]:
"""
Save a recording session to a markdown file.
Args:
start_time: Session start datetime
end_time: Session end datetime
transcript: Full transcript text
word_count: Number of words in transcript
client_id: WebSocket client session ID
Returns:
Filename if successful, None if failed
"""
try:
ensure_recordings_dir()
filename = generate_filename(start_time)
filepath = os.path.join(RECORDINGS_DIR, filename)
duration = calculate_duration(start_time, end_time)
model_name = get_whisper_model_name()
# Build markdown content with frontmatter
content = f"""---
session_start: {start_time.isoformat()}
session_end: {end_time.isoformat()}
duration: {duration}
whisper_model: {model_name}
word_count: {word_count}
---
# Live Captions Recording
**Session Start:** {start_time.strftime('%Y-%m-%d %H:%M:%S')}
**Session End:** {end_time.strftime('%Y-%m-%d %H:%M:%S')}
**Duration:** {duration}
**Model:** {model_name}
**Words:** {word_count}
---
## Transcript
{transcript}
"""
with open(filepath, 'w', encoding='utf-8') as f:
f.write(content)
logger.info(f"Recording saved: {filename} ({word_count} words)")
return filename
except Exception as e:
logger.error(f"Failed to save recording: {e}")
return None
def list_recordings() -> list:
"""
List all recording files, sorted by date descending.
Returns:
List of recording metadata dicts
"""
ensure_recordings_dir()
recordings = []
try:
for filename in os.listdir(RECORDINGS_DIR):
if filename.endswith('_captions.md'):
filepath = os.path.join(RECORDINGS_DIR, filename)
stat = os.stat(filepath)
# Parse date from filename (YYYY-MM-DD_HH-MM-SS_captions.md)
try:
date_str = filename.replace('_captions.md', '')
date_parts = date_str.split('_')
display_date = f"{date_parts[0]} {date_parts[1].replace('-', ':')}"
except (IndexError, ValueError):
display_date = filename
recordings.append({
'filename': filename,
'date': display_date,
'size': stat.st_size,
'created': datetime.fromtimestamp(stat.st_mtime).isoformat()
})
# Sort by filename descending (newest first)
recordings.sort(key=lambda x: x['filename'], reverse=True)
except Exception as e:
logger.error(f"Failed to list recordings: {e}")
return recordings
def get_recording(filename: str) -> Optional[dict]:
"""
Get a specific recording's content.
Args:
filename: The recording filename
Returns:
Dict with filename and content, or None if not found
"""
ensure_recordings_dir()
# Sanitize filename to prevent path traversal
safe_filename = os.path.basename(filename)
if not safe_filename.endswith('_captions.md'):
return None
filepath = os.path.join(RECORDINGS_DIR, safe_filename)
try:
if os.path.exists(filepath):
with open(filepath, 'r', encoding='utf-8') as f:
content = f.read()
return {
'filename': safe_filename,
'content': content
}
except Exception as e:
logger.error(f"Failed to read recording {safe_filename}: {e}")
return None
def delete_recording(filename: str) -> bool:
"""
Delete a specific recording.
Args:
filename: The recording filename
Returns:
True if deleted, False otherwise
"""
ensure_recordings_dir()
# Sanitize filename to prevent path traversal
safe_filename = os.path.basename(filename)
if not safe_filename.endswith('_captions.md'):
return False
filepath = os.path.join(RECORDINGS_DIR, safe_filename)
try:
if os.path.exists(filepath):
os.remove(filepath)
logger.info(f"Recording deleted: {safe_filename}")
return True
except Exception as e:
logger.error(f"Failed to delete recording {safe_filename}: {e}")
return False

9
requirements.txt Normal file
View File

@ -0,0 +1,9 @@
flask>=3.0.0
flask-socketio>=5.3.0
faster-whisper>=1.0.0
pydub>=0.25.1
python-dotenv>=1.0.0
python-engineio>=4.8.0
python-socketio>=5.10.0
gevent>=24.2.1
gevent-websocket>=0.10.1

567
static/css/style.css Normal file
View File

@ -0,0 +1,567 @@
/**
* Live Captions - Stylesheet
*/
/* Reset and Base */
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
:root {
--bg-primary: #0d0d1a;
--bg-secondary: #1a1a2e;
--bg-tertiary: #252542;
--text-primary: #ffffff;
--text-secondary: #a0a0b0;
--accent: #4a9eff;
--accent-hover: #6ab0ff;
--danger: #ff4a6a;
--danger-hover: #ff6a85;
--success: #4aff8a;
--warning: #ffa64a;
--border-radius: 8px;
--transition: 0.2s ease;
}
html, body {
height: 100%;
font-family: 'Segoe UI', system-ui, -apple-system, sans-serif;
background-color: var(--bg-primary);
color: var(--text-primary);
}
#app {
display: flex;
flex-direction: column;
height: 100vh;
padding: 20px;
}
/* Caption Container */
#caption-container {
flex: 1;
display: flex;
flex-direction: column;
justify-content: center;
background-color: rgba(26, 26, 46, 0.9);
border-radius: 10px;
padding: 20px;
margin-bottom: 20px;
overflow: hidden;
font-size: 32px;
font-family: Arial, sans-serif;
text-align: center;
}
#captions {
line-height: 1.4;
word-wrap: break-word;
overflow-wrap: break-word;
}
/* Controls Bar */
#controls {
display: flex;
gap: 10px;
justify-content: center;
align-items: center;
padding: 15px;
background-color: var(--bg-secondary);
border-radius: var(--border-radius);
}
/* Buttons */
.btn {
display: inline-flex;
align-items: center;
gap: 8px;
padding: 12px 24px;
border: none;
border-radius: var(--border-radius);
font-size: 16px;
font-weight: 500;
cursor: pointer;
transition: background-color var(--transition), transform var(--transition);
}
.btn:hover:not(:disabled) {
transform: translateY(-1px);
}
.btn:disabled {
opacity: 0.5;
cursor: not-allowed;
}
.btn-primary {
background-color: var(--accent);
color: white;
}
.btn-primary:hover:not(:disabled) {
background-color: var(--accent-hover);
}
.btn-danger {
background-color: var(--danger);
color: white;
}
.btn-danger:hover:not(:disabled) {
background-color: var(--danger-hover);
}
.btn-secondary {
background-color: var(--bg-tertiary);
color: var(--text-primary);
}
.btn-secondary:hover:not(:disabled) {
background-color: #323258;
}
.btn-success {
background-color: var(--success);
color: #000;
}
.btn-success:hover:not(:disabled) {
background-color: #5aff9a;
}
/* Toggle Switch */
.toggle-switch {
display: flex;
align-items: center;
gap: 10px;
cursor: pointer;
user-select: none;
padding: 8px 12px;
background-color: var(--bg-tertiary);
border-radius: var(--border-radius);
}
.toggle-switch input {
display: none;
}
.toggle-slider {
position: relative;
width: 44px;
height: 24px;
background-color: var(--text-secondary);
border-radius: 12px;
transition: background-color var(--transition);
}
.toggle-slider::before {
content: '';
position: absolute;
top: 3px;
left: 3px;
width: 18px;
height: 18px;
background-color: white;
border-radius: 50%;
transition: transform var(--transition);
}
.toggle-switch input:checked + .toggle-slider {
background-color: var(--success);
}
.toggle-switch input:checked + .toggle-slider::before {
transform: translateX(20px);
}
.toggle-label {
font-size: 14px;
font-weight: 500;
color: var(--text-primary);
}
.btn-icon {
width: 48px;
height: 48px;
padding: 0;
display: flex;
align-items: center;
justify-content: center;
background-color: var(--bg-tertiary);
color: var(--text-primary);
font-size: 20px;
}
.btn-icon:hover:not(:disabled) {
background-color: #323258;
}
.icon {
font-size: 14px;
}
/* Status Indicator */
#status {
display: flex;
align-items: center;
gap: 8px;
justify-content: center;
padding: 10px;
font-size: 14px;
color: var(--text-secondary);
}
.dot {
width: 10px;
height: 10px;
border-radius: 50%;
background-color: var(--text-secondary);
}
.dot.connected {
background-color: var(--success);
}
.dot.recording {
background-color: var(--danger);
animation: pulse 1s infinite;
}
.dot.disconnected {
background-color: var(--text-secondary);
}
.dot.error {
background-color: var(--warning);
}
@keyframes pulse {
0%, 100% { opacity: 1; }
50% { opacity: 0.5; }
}
/* Settings Panel */
.panel {
position: fixed;
top: 0;
right: 0;
width: 350px;
height: 100vh;
background-color: var(--bg-secondary);
box-shadow: -5px 0 20px rgba(0, 0, 0, 0.3);
z-index: 1000;
display: flex;
flex-direction: column;
transition: transform var(--transition);
}
.panel.hidden {
transform: translateX(100%);
}
.panel-header {
display: flex;
justify-content: space-between;
align-items: center;
padding: 20px;
border-bottom: 1px solid var(--bg-tertiary);
}
.panel-header h2 {
font-size: 20px;
font-weight: 600;
}
.btn-close {
width: 36px;
height: 36px;
border: none;
background: var(--bg-tertiary);
color: var(--text-primary);
font-size: 24px;
border-radius: 50%;
cursor: pointer;
transition: background-color var(--transition);
}
.btn-close:hover {
background-color: var(--danger);
}
.panel-content {
flex: 1;
overflow-y: auto;
padding: 20px;
}
/* Settings Groups */
.setting-group {
margin-bottom: 25px;
}
.setting-group h3 {
font-size: 14px;
font-weight: 600;
color: var(--accent);
text-transform: uppercase;
letter-spacing: 1px;
margin-bottom: 15px;
}
.setting-group label {
display: block;
font-size: 14px;
color: var(--text-secondary);
margin-bottom: 5px;
margin-top: 12px;
}
.setting-group label:first-of-type {
margin-top: 0;
}
/* Form Controls */
select,
input[type="text"] {
width: 100%;
padding: 10px 12px;
background-color: var(--bg-tertiary);
border: 1px solid transparent;
border-radius: var(--border-radius);
color: var(--text-primary);
font-size: 14px;
transition: border-color var(--transition);
}
select:focus,
input[type="text"]:focus {
outline: none;
border-color: var(--accent);
}
input[type="range"] {
width: 100%;
height: 6px;
background: var(--bg-tertiary);
border-radius: 3px;
appearance: none;
cursor: pointer;
}
input[type="range"]::-webkit-slider-thumb {
appearance: none;
width: 18px;
height: 18px;
background: var(--accent);
border-radius: 50%;
cursor: pointer;
transition: background-color var(--transition);
}
input[type="range"]::-webkit-slider-thumb:hover {
background: var(--accent-hover);
}
input[type="range"]::-moz-range-thumb {
width: 18px;
height: 18px;
background: var(--accent);
border-radius: 50%;
border: none;
cursor: pointer;
}
input[type="color"] {
width: 100%;
height: 40px;
padding: 2px;
background-color: var(--bg-tertiary);
border: 1px solid transparent;
border-radius: var(--border-radius);
cursor: pointer;
}
input[type="color"]::-webkit-color-swatch-wrapper {
padding: 0;
}
input[type="color"]::-webkit-color-swatch {
border: none;
border-radius: calc(var(--border-radius) - 3px);
}
/* Setting Actions */
.setting-actions {
display: flex;
flex-direction: column;
gap: 10px;
margin-top: 20px;
padding-top: 20px;
border-top: 1px solid var(--bg-tertiary);
}
.setting-actions .btn {
width: 100%;
justify-content: center;
}
/* Overlay */
#overlay {
position: fixed;
top: 0;
left: 0;
width: 100%;
height: 100%;
background-color: rgba(0, 0, 0, 0.5);
z-index: 999;
transition: opacity var(--transition);
}
#overlay.hidden {
opacity: 0;
pointer-events: none;
}
/* Scrollbar Styling */
::-webkit-scrollbar {
width: 8px;
}
::-webkit-scrollbar-track {
background: var(--bg-tertiary);
border-radius: 4px;
}
::-webkit-scrollbar-thumb {
background: var(--text-secondary);
border-radius: 4px;
}
::-webkit-scrollbar-thumb:hover {
background: var(--accent);
}
/* Recordings Panel */
.recordings-list {
display: flex;
flex-direction: column;
gap: 8px;
}
.recordings-empty {
color: var(--text-secondary);
text-align: center;
padding: 40px 20px;
}
.recording-item {
display: flex;
justify-content: space-between;
align-items: center;
padding: 12px 15px;
background-color: var(--bg-tertiary);
border-radius: var(--border-radius);
cursor: pointer;
transition: background-color var(--transition);
}
.recording-item:hover {
background-color: #323258;
}
.recording-info {
display: flex;
flex-direction: column;
gap: 4px;
}
.recording-date {
font-size: 14px;
font-weight: 500;
color: var(--text-primary);
}
.recording-meta {
font-size: 12px;
color: var(--text-secondary);
}
.recording-arrow {
color: var(--text-secondary);
font-size: 18px;
}
/* Recording Viewer */
.recording-viewer {
display: flex;
flex-direction: column;
height: 100%;
}
.recording-viewer.hidden {
display: none;
}
.viewer-header {
display: flex;
align-items: center;
gap: 12px;
margin-bottom: 15px;
padding-bottom: 15px;
border-bottom: 1px solid var(--bg-tertiary);
}
.viewer-filename {
font-size: 12px;
color: var(--text-secondary);
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
}
.viewer-content {
flex: 1;
overflow-y: auto;
padding: 15px;
background-color: var(--bg-tertiary);
border-radius: var(--border-radius);
font-size: 14px;
line-height: 1.6;
white-space: pre-wrap;
word-wrap: break-word;
}
.viewer-actions {
display: flex;
justify-content: flex-end;
margin-top: 15px;
padding-top: 15px;
border-top: 1px solid var(--bg-tertiary);
}
.btn-small {
padding: 8px 16px;
font-size: 13px;
}
/* Responsive */
@media (max-width: 600px) {
#app {
padding: 10px;
}
.panel {
width: 100%;
}
#controls {
flex-wrap: wrap;
}
.btn {
padding: 10px 16px;
font-size: 14px;
}
}

355
static/js/app.js Normal file
View File

@ -0,0 +1,355 @@
/**
* Live Captions - Main Application
* Handles audio capture and WebSocket communication
*/
const App = {
// WebSocket connection
socket: null,
// Audio recording
mediaRecorder: null,
audioStream: null,
audioChunks: [],
isRecording: false,
recordingInterval: null,
// Continuous caption stream
wordBuffer: [],
pendingWords: [],
wordAnimationTimer: null,
// Auto-save recording state
sessionStartTime: null,
sessionTranscript: [],
// DOM elements
elements: {},
/**
* Initialize the application
*/
init() {
this.cacheElements();
this.bindEvents();
this.connectSocket();
// Initialize settings module
Settings.init();
},
/**
* Cache DOM element references
*/
cacheElements() {
this.elements = {
btnStart: document.getElementById('btn-start'),
btnStop: document.getElementById('btn-stop'),
btnClear: document.getElementById('btn-clear'),
autoSaveToggle: document.getElementById('auto-save-toggle'),
captions: document.getElementById('captions'),
statusDot: document.getElementById('status-dot'),
statusText: document.getElementById('status-text'),
};
},
/**
* Bind event listeners
*/
bindEvents() {
this.elements.btnStart.addEventListener('click', () => this.startRecording());
this.elements.btnStop.addEventListener('click', () => this.stopRecording());
this.elements.btnClear.addEventListener('click', () => this.clearCaptions());
// Load auto-save preference from localStorage
const savedPref = localStorage.getItem('autoSaveEnabled');
if (savedPref === 'true') {
this.elements.autoSaveToggle.checked = true;
}
// Save preference when toggled
this.elements.autoSaveToggle.addEventListener('change', (e) => {
localStorage.setItem('autoSaveEnabled', e.target.checked);
});
},
/**
* Connect to WebSocket server
*/
connectSocket() {
this.socket = io();
this.socket.on('connect', () => {
console.log('Connected to server');
this.setStatus('connected', 'Connected');
});
this.socket.on('disconnect', () => {
console.log('Disconnected from server');
this.setStatus('disconnected', 'Disconnected');
});
this.socket.on('transcription', (data) => {
this.addWords(data.text);
});
this.socket.on('settings_updated', (settings) => {
Settings.applySettings(settings);
});
this.socket.on('error', (data) => {
console.error('Server error:', data.message);
});
this.socket.on('recording_saved', (data) => {
console.log('Recording saved:', data.filename);
});
this.socket.on('recording_error', (data) => {
console.error('Recording error:', data.message);
});
},
/**
* Update status indicator
*/
setStatus(state, text) {
this.elements.statusDot.className = `dot ${state}`;
this.elements.statusText.textContent = text;
},
/**
* Start audio recording
*/
async startRecording() {
try {
this.audioStream = await navigator.mediaDevices.getUserMedia({
audio: {
echoCancellation: true,
noiseSuppression: true,
sampleRate: 16000,
}
});
this.isRecording = true;
this.elements.btnStart.disabled = true;
this.elements.btnStop.disabled = false;
this.setStatus('recording', 'Recording...');
// Reset session transcript for auto-save
this.sessionStartTime = new Date();
this.sessionTranscript = [];
// Start the recording cycle
this.startRecordingCycle();
} catch (error) {
console.error('Error starting recording:', error);
this.setStatus('error', 'Microphone access denied');
}
},
/**
* Start a recording cycle - record for a duration, then send and restart
*/
startRecordingCycle() {
if (!this.isRecording || !this.audioStream) return;
// Determine best supported MIME type
let mimeType = 'audio/webm';
if (MediaRecorder.isTypeSupported('audio/webm;codecs=opus')) {
mimeType = 'audio/webm;codecs=opus';
}
this.audioChunks = [];
this.mediaRecorder = new MediaRecorder(this.audioStream, { mimeType });
this.mediaRecorder.ondataavailable = (event) => {
if (event.data.size > 0) {
this.audioChunks.push(event.data);
}
};
this.mediaRecorder.onstop = () => {
// Create a complete blob from all chunks
if (this.audioChunks.length > 0) {
const blob = new Blob(this.audioChunks, { type: 'audio/webm' });
this.sendAudioBlob(blob);
}
// Start next cycle if still recording
if (this.isRecording) {
this.startRecordingCycle();
}
};
// Start recording
this.mediaRecorder.start();
// Stop after the configured duration to get a complete blob
// Using 1.5 seconds for more responsive streaming
const chunkDuration = 1500;
this.recordingInterval = setTimeout(() => {
if (this.mediaRecorder && this.mediaRecorder.state === 'recording') {
this.mediaRecorder.stop();
}
}, chunkDuration);
},
/**
* Stop audio recording
*/
stopRecording() {
this.isRecording = false;
// Clear the recording interval
if (this.recordingInterval) {
clearTimeout(this.recordingInterval);
this.recordingInterval = null;
}
// Stop the media recorder
if (this.mediaRecorder && this.mediaRecorder.state === 'recording') {
this.mediaRecorder.stop();
}
// Stop all tracks
if (this.audioStream) {
this.audioStream.getTracks().forEach(track => track.stop());
this.audioStream = null;
}
this.elements.btnStart.disabled = false;
this.elements.btnStop.disabled = true;
this.setStatus('connected', 'Connected');
// Auto-save if enabled and we have content
if (this.elements.autoSaveToggle.checked && this.sessionTranscript.length > 0) {
this.saveRecording();
}
},
/**
* Send complete audio blob to server
*/
sendAudioBlob(blob) {
const reader = new FileReader();
reader.onloadend = () => {
// Get base64 data without the data URL prefix
const base64 = reader.result.split(',')[1];
this.socket.emit('audio_data', {
audio: base64,
format: 'webm'
});
};
reader.readAsDataURL(blob);
},
/**
* Add words to the continuous caption stream
*/
addWords(text) {
if (!text.trim()) return;
// Split incoming text into words
const newWords = text.trim().split(/\s+/);
// Add to pending queue for animated display
this.pendingWords.push(...newWords);
// Accumulate to session transcript for auto-save
if (this.isRecording) {
this.sessionTranscript.push(...newWords);
}
// Start animation if not already running
if (!this.wordAnimationTimer) {
this.animateNextWord();
}
},
/**
* Animate words appearing one by one
*/
animateNextWord() {
if (this.pendingWords.length === 0) {
this.wordAnimationTimer = null;
return;
}
// Get next word from queue
const word = this.pendingWords.shift();
this.wordBuffer.push(word);
// Get max words from settings
const maxWords = Settings.current.max_words || 30;
// Trim buffer to max words
while (this.wordBuffer.length > maxWords) {
this.wordBuffer.shift();
}
// Update display
this.updateCaptionDisplay();
// Calculate delay based on pending words
// Faster if more words pending, slower if caught up
const baseDelay = 80; // ms per word
const minDelay = 30;
const delay = this.pendingWords.length > 10 ? minDelay : baseDelay;
// Schedule next word
this.wordAnimationTimer = setTimeout(() => {
this.animateNextWord();
}, delay);
},
/**
* Update the caption display with current word buffer
*/
updateCaptionDisplay() {
const text = this.wordBuffer.join(' ');
this.elements.captions.textContent = text;
},
/**
* Clear all captions
*/
clearCaptions() {
// Clear animation timer
if (this.wordAnimationTimer) {
clearTimeout(this.wordAnimationTimer);
this.wordAnimationTimer = null;
}
this.wordBuffer = [];
this.pendingWords = [];
this.elements.captions.textContent = '';
},
/**
* Save the current recording session
*/
saveRecording() {
if (!this.sessionStartTime) return;
const endTime = new Date();
const transcript = this.sessionTranscript.join(' ');
this.socket.emit('save_recording', {
startTime: this.sessionStartTime.toISOString(),
endTime: endTime.toISOString(),
transcript: transcript,
wordCount: this.sessionTranscript.length
});
// Reset session state
this.sessionStartTime = null;
this.sessionTranscript = [];
}
};
// Initialize when DOM is ready
document.addEventListener('DOMContentLoaded', () => {
App.init();
});

204
static/js/recordings.js Normal file
View File

@ -0,0 +1,204 @@
/**
* Live Captions - Recordings Panel
* Handles viewing and managing saved recordings
*/
const Recordings = {
// Current state
recordings: [],
currentRecording: null,
// DOM elements
elements: {},
/**
* Initialize the recordings panel
*/
init() {
this.cacheElements();
this.bindEvents();
},
/**
* Cache DOM element references
*/
cacheElements() {
this.elements = {
btnRecordings: document.getElementById('btn-recordings'),
btnClose: document.getElementById('btn-close-recordings'),
btnBackToList: document.getElementById('btn-back-to-list'),
btnDelete: document.getElementById('btn-delete-recording'),
panel: document.getElementById('recordings-panel'),
overlay: document.getElementById('overlay'),
recordingsList: document.getElementById('recordings-list'),
recordingViewer: document.getElementById('recording-viewer'),
viewerFilename: document.getElementById('viewer-filename'),
viewerContent: document.getElementById('viewer-content'),
};
},
/**
* Bind event listeners
*/
bindEvents() {
this.elements.btnRecordings.addEventListener('click', () => this.openPanel());
this.elements.btnClose.addEventListener('click', () => this.closePanel());
this.elements.btnBackToList.addEventListener('click', () => this.showList());
this.elements.btnDelete.addEventListener('click', () => this.deleteCurrentRecording());
// Close on overlay click (but check if it's not settings panel)
this.elements.overlay.addEventListener('click', () => {
if (!this.elements.panel.classList.contains('hidden')) {
this.closePanel();
}
});
},
/**
* Open the recordings panel
*/
openPanel() {
this.elements.panel.classList.remove('hidden');
this.elements.overlay.classList.remove('hidden');
this.showList();
this.loadRecordings();
},
/**
* Close the recordings panel
*/
closePanel() {
this.elements.panel.classList.add('hidden');
this.elements.overlay.classList.add('hidden');
this.currentRecording = null;
},
/**
* Show the recordings list view
*/
showList() {
this.elements.recordingsList.classList.remove('hidden');
this.elements.recordingViewer.classList.add('hidden');
},
/**
* Show the recording viewer
*/
showViewer() {
this.elements.recordingsList.classList.add('hidden');
this.elements.recordingViewer.classList.remove('hidden');
},
/**
* Load recordings from the API
*/
async loadRecordings() {
this.elements.recordingsList.innerHTML = '<p class="recordings-empty">Loading recordings...</p>';
try {
const response = await fetch('/api/recordings');
if (!response.ok) throw new Error('Failed to load recordings');
this.recordings = await response.json();
this.renderRecordingsList();
} catch (error) {
console.error('Error loading recordings:', error);
this.elements.recordingsList.innerHTML =
'<p class="recordings-empty">Failed to load recordings</p>';
}
},
/**
* Render the recordings list
*/
renderRecordingsList() {
if (this.recordings.length === 0) {
this.elements.recordingsList.innerHTML =
'<p class="recordings-empty">No recordings yet.<br>Enable auto-save and record some captions!</p>';
return;
}
const html = this.recordings.map(recording => `
<div class="recording-item" data-filename="${recording.filename}">
<div class="recording-info">
<span class="recording-date">${recording.date}</span>
<span class="recording-meta">${this.formatFileSize(recording.size)}</span>
</div>
<span class="recording-arrow">&rsaquo;</span>
</div>
`).join('');
this.elements.recordingsList.innerHTML = html;
// Bind click events to items
this.elements.recordingsList.querySelectorAll('.recording-item').forEach(item => {
item.addEventListener('click', () => {
const filename = item.dataset.filename;
this.viewRecording(filename);
});
});
},
/**
* Format file size in human-readable format
*/
formatFileSize(bytes) {
if (bytes < 1024) return bytes + ' B';
if (bytes < 1024 * 1024) return (bytes / 1024).toFixed(1) + ' KB';
return (bytes / (1024 * 1024)).toFixed(1) + ' MB';
},
/**
* View a specific recording
*/
async viewRecording(filename) {
try {
const response = await fetch(`/api/recordings/${encodeURIComponent(filename)}`);
if (!response.ok) throw new Error('Failed to load recording');
const data = await response.json();
this.currentRecording = filename;
this.elements.viewerFilename.textContent = filename;
this.elements.viewerContent.textContent = data.content;
this.showViewer();
} catch (error) {
console.error('Error loading recording:', error);
alert('Failed to load recording');
}
},
/**
* Delete the currently viewed recording
*/
async deleteCurrentRecording() {
if (!this.currentRecording) return;
if (!confirm('Are you sure you want to delete this recording?')) {
return;
}
try {
const response = await fetch(`/api/recordings/${encodeURIComponent(this.currentRecording)}`, {
method: 'DELETE'
});
if (!response.ok) throw new Error('Failed to delete recording');
// Remove from local list
this.recordings = this.recordings.filter(r => r.filename !== this.currentRecording);
this.currentRecording = null;
// Go back to list
this.showList();
this.renderRecordingsList();
} catch (error) {
console.error('Error deleting recording:', error);
alert('Failed to delete recording');
}
}
};
// Initialize when DOM is ready
document.addEventListener('DOMContentLoaded', () => {
Recordings.init();
});

259
static/js/settings.js Normal file
View File

@ -0,0 +1,259 @@
/**
* Settings Panel Module
* Handles user settings UI and persistence
*/
const Settings = {
// Current settings state
current: {},
// DOM elements
elements: {},
/**
* Initialize the settings module
*/
init() {
this.cacheElements();
this.bindEvents();
},
/**
* Cache DOM element references
*/
cacheElements() {
this.elements = {
panel: document.getElementById('settings-panel'),
overlay: document.getElementById('overlay'),
btnSettings: document.getElementById('btn-settings'),
btnClose: document.getElementById('btn-close-settings'),
btnSave: document.getElementById('btn-save-settings'),
btnReset: document.getElementById('btn-reset-settings'),
// Text settings
fontFamily: document.getElementById('font-family'),
fontSize: document.getElementById('font-size'),
fontSizeValue: document.getElementById('font-size-value'),
fontWeight: document.getElementById('font-weight'),
textColor: document.getElementById('text-color'),
textAlign: document.getElementById('text-align'),
// Background settings
backgroundColor: document.getElementById('background-color'),
backgroundOpacity: document.getElementById('background-opacity'),
opacityValue: document.getElementById('opacity-value'),
borderRadius: document.getElementById('border-radius'),
radiusValue: document.getElementById('radius-value'),
padding: document.getElementById('padding'),
paddingValue: document.getElementById('padding-value'),
// Behavior settings
maxWords: document.getElementById('max-words'),
maxWordsValue: document.getElementById('max-words-value'),
// Caption display
captionContainer: document.getElementById('caption-container'),
};
},
/**
* Bind event listeners
*/
bindEvents() {
// Panel open/close
this.elements.btnSettings.addEventListener('click', () => this.openPanel());
this.elements.btnClose.addEventListener('click', () => this.closePanel());
this.elements.overlay.addEventListener('click', () => this.closePanel());
// Save/Reset
this.elements.btnSave.addEventListener('click', () => this.saveSettings());
this.elements.btnReset.addEventListener('click', () => this.resetSettings());
// Live preview on input change
const inputs = [
'fontFamily', 'fontSize', 'fontWeight', 'textColor', 'textAlign',
'backgroundColor', 'backgroundOpacity', 'borderRadius', 'padding',
'maxWords'
];
inputs.forEach(name => {
const element = this.elements[name];
if (element) {
element.addEventListener('input', () => this.updatePreview());
}
});
// Update value displays for range inputs
this.elements.fontSize.addEventListener('input', (e) => {
this.elements.fontSizeValue.textContent = e.target.value;
});
this.elements.backgroundOpacity.addEventListener('input', (e) => {
this.elements.opacityValue.textContent = e.target.value;
});
this.elements.borderRadius.addEventListener('input', (e) => {
this.elements.radiusValue.textContent = e.target.value;
});
this.elements.padding.addEventListener('input', (e) => {
this.elements.paddingValue.textContent = e.target.value;
});
this.elements.maxWords.addEventListener('input', (e) => {
this.elements.maxWordsValue.textContent = e.target.value;
});
},
/**
* Open settings panel
*/
openPanel() {
this.elements.panel.classList.remove('hidden');
this.elements.overlay.classList.remove('hidden');
},
/**
* Close settings panel
*/
closePanel() {
this.elements.panel.classList.add('hidden');
this.elements.overlay.classList.add('hidden');
},
/**
* Apply settings to the UI
*/
applySettings(settings) {
this.current = settings;
// Update form values
this.elements.fontFamily.value = settings.font_family;
this.elements.fontSize.value = settings.font_size;
this.elements.fontSizeValue.textContent = settings.font_size;
this.elements.fontWeight.value = settings.font_weight;
this.elements.textColor.value = settings.text_color;
this.elements.textAlign.value = settings.text_align;
this.elements.backgroundColor.value = settings.background_color;
this.elements.backgroundOpacity.value = Math.round(settings.background_opacity * 100);
this.elements.opacityValue.textContent = Math.round(settings.background_opacity * 100);
this.elements.borderRadius.value = settings.border_radius;
this.elements.radiusValue.textContent = settings.border_radius;
this.elements.padding.value = settings.padding;
this.elements.paddingValue.textContent = settings.padding;
this.elements.maxWords.value = settings.max_words || 30;
this.elements.maxWordsValue.textContent = settings.max_words || 30;
// Apply to caption container
this.updatePreview();
},
/**
* Update live preview of caption styling
*/
updatePreview() {
const container = this.elements.captionContainer;
const opacity = this.elements.backgroundOpacity.value / 100;
// Parse background color and apply opacity
const bgColor = this.elements.backgroundColor.value;
const r = parseInt(bgColor.slice(1, 3), 16);
const g = parseInt(bgColor.slice(3, 5), 16);
const b = parseInt(bgColor.slice(5, 7), 16);
container.style.fontFamily = this.elements.fontFamily.value;
container.style.fontSize = `${this.elements.fontSize.value}px`;
container.style.fontWeight = this.elements.fontWeight.value;
container.style.color = this.elements.textColor.value;
container.style.textAlign = this.elements.textAlign.value;
container.style.backgroundColor = `rgba(${r}, ${g}, ${b}, ${opacity})`;
container.style.borderRadius = `${this.elements.borderRadius.value}px`;
container.style.padding = `${this.elements.padding.value}px`;
// Store max words for caption management
this.current.max_words = parseInt(this.elements.maxWords.value);
},
/**
* Get current form values as settings object
*/
getFormValues() {
return {
font_family: this.elements.fontFamily.value,
font_size: parseInt(this.elements.fontSize.value),
font_weight: this.elements.fontWeight.value,
text_color: this.elements.textColor.value,
text_align: this.elements.textAlign.value,
background_color: this.elements.backgroundColor.value,
background_opacity: this.elements.backgroundOpacity.value / 100,
border_radius: parseInt(this.elements.borderRadius.value),
padding: parseInt(this.elements.padding.value),
max_words: parseInt(this.elements.maxWords.value),
};
},
/**
* Save settings to server
*/
async saveSettings() {
const settings = this.getFormValues();
try {
const response = await fetch('/api/settings', {
method: 'PUT',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(settings),
});
if (response.ok) {
this.current = await response.json();
this.closePanel();
console.log('Settings saved');
} else {
console.error('Failed to save settings');
}
} catch (error) {
console.error('Error saving settings:', error);
}
},
/**
* Reset settings to defaults
*/
async resetSettings() {
if (!confirm('Reset all settings to defaults?')) {
return;
}
try {
const response = await fetch('/api/settings/reset', {
method: 'POST',
});
if (response.ok) {
const settings = await response.json();
this.applySettings(settings);
console.log('Settings reset to defaults');
} else {
console.error('Failed to reset settings');
}
} catch (error) {
console.error('Error resetting settings:', error);
}
},
/**
* Fetch settings from server
*/
async fetchSettings() {
try {
const response = await fetch('/api/settings');
if (response.ok) {
const settings = await response.json();
this.applySettings(settings);
}
} catch (error) {
console.error('Error fetching settings:', error);
}
}
};

159
templates/index.html Normal file
View File

@ -0,0 +1,159 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Live Captions</title>
<link rel="stylesheet" href="/static/css/style.css">
</head>
<body>
<div id="app">
<!-- Caption Display Area -->
<div id="caption-container">
<div id="captions"></div>
</div>
<!-- Controls Bar -->
<div id="controls">
<button id="btn-start" class="btn btn-primary">
<span class="icon">&#9658;</span> Start
</button>
<button id="btn-stop" class="btn btn-danger" disabled>
<span class="icon">&#9632;</span> Stop
</button>
<button id="btn-clear" class="btn btn-secondary">
Clear
</button>
<label class="toggle-switch" title="Auto-save recordings">
<input type="checkbox" id="auto-save-toggle">
<span class="toggle-slider"></span>
<span class="toggle-label">Auto-save</span>
</label>
<button id="btn-recordings" class="btn btn-icon" title="Recordings">
&#128203;
</button>
<button id="btn-settings" class="btn btn-icon" title="Settings">
&#9881;
</button>
</div>
<!-- Status Indicator -->
<div id="status">
<span id="status-dot" class="dot"></span>
<span id="status-text">Ready</span>
</div>
<!-- Settings Panel -->
<div id="settings-panel" class="panel hidden">
<div class="panel-header">
<h2>Settings</h2>
<button id="btn-close-settings" class="btn-close">&times;</button>
</div>
<div class="panel-content">
<!-- Font Settings -->
<div class="setting-group">
<h3>Text</h3>
<label for="font-family">Font Family</label>
<select id="font-family">
<option value="Arial, sans-serif">Arial</option>
<option value="'Helvetica Neue', Helvetica, sans-serif">Helvetica</option>
<option value="'Segoe UI', sans-serif">Segoe UI</option>
<option value="'Roboto', sans-serif">Roboto</option>
<option value="'Open Sans', sans-serif">Open Sans</option>
<option value="Georgia, serif">Georgia</option>
<option value="'Times New Roman', serif">Times New Roman</option>
<option value="'Courier New', monospace">Courier New</option>
<option value="monospace">Monospace</option>
</select>
<label for="font-size">Font Size: <span id="font-size-value">32</span>px</label>
<input type="range" id="font-size" min="16" max="72" value="32">
<label for="font-weight">Font Weight</label>
<select id="font-weight">
<option value="normal">Normal</option>
<option value="bold">Bold</option>
<option value="lighter">Light</option>
</select>
<label for="text-color">Text Color</label>
<input type="color" id="text-color" value="#ffffff">
<label for="text-align">Text Alignment</label>
<select id="text-align">
<option value="left">Left</option>
<option value="center">Center</option>
<option value="right">Right</option>
</select>
</div>
<!-- Background Settings -->
<div class="setting-group">
<h3>Background</h3>
<label for="background-color">Background Color</label>
<input type="color" id="background-color" value="#1a1a2e">
<label for="background-opacity">Opacity: <span id="opacity-value">90</span>%</label>
<input type="range" id="background-opacity" min="0" max="100" value="90">
<label for="border-radius">Corner Radius: <span id="radius-value">10</span>px</label>
<input type="range" id="border-radius" min="0" max="30" value="10">
<label for="padding">Padding: <span id="padding-value">20</span>px</label>
<input type="range" id="padding" min="5" max="50" value="20">
</div>
<!-- Caption Behavior -->
<div class="setting-group">
<h3>Behavior</h3>
<label for="max-words">Max Words: <span id="max-words-value">30</span></label>
<input type="range" id="max-words" min="1" max="100" value="30">
</div>
<!-- Actions -->
<div class="setting-actions">
<button id="btn-save-settings" class="btn btn-primary">Save Settings</button>
<button id="btn-reset-settings" class="btn btn-secondary">Reset to Defaults</button>
</div>
</div>
</div>
<!-- Recordings Panel -->
<div id="recordings-panel" class="panel hidden">
<div class="panel-header">
<h2>Recordings</h2>
<button id="btn-close-recordings" class="btn-close">&times;</button>
</div>
<div class="panel-content">
<!-- Recordings List -->
<div id="recordings-list" class="recordings-list">
<p class="recordings-empty">Loading recordings...</p>
</div>
<!-- Recording Viewer -->
<div id="recording-viewer" class="recording-viewer hidden">
<div class="viewer-header">
<button id="btn-back-to-list" class="btn btn-secondary btn-small">&larr; Back</button>
<span id="viewer-filename" class="viewer-filename"></span>
</div>
<div id="viewer-content" class="viewer-content"></div>
<div class="viewer-actions">
<button id="btn-delete-recording" class="btn btn-danger btn-small">Delete</button>
</div>
</div>
</div>
</div>
<!-- Overlay for panels -->
<div id="overlay" class="hidden"></div>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/4.7.2/socket.io.min.js"></script>
<script src="/static/js/settings.js"></script>
<script src="/static/js/recordings.js"></script>
<script src="/static/js/app.js"></script>
</body>
</html>

102
transcriber.py Normal file
View File

@ -0,0 +1,102 @@
"""
Whisper transcription module using faster-whisper.
"""
import os
import io
import tempfile
import logging
from faster_whisper import WhisperModel
from pydub import AudioSegment
logger = logging.getLogger(__name__)
# Global model instance (loaded once)
_model = None
def get_model():
"""Get or initialize the Whisper model."""
global _model
if _model is None:
model_size = os.environ.get('WHISPER_MODEL', 'base')
device = os.environ.get('WHISPER_DEVICE', 'cpu')
compute_type = os.environ.get('WHISPER_COMPUTE_TYPE', 'int8')
logger.info(f"Loading Whisper model: {model_size} on {device} ({compute_type})")
_model = WhisperModel(
model_size,
device=device,
compute_type=compute_type
)
logger.info("Whisper model loaded successfully")
return _model
def transcribe_audio(audio_bytes, format='webm'):
"""
Transcribe audio bytes to text.
Args:
audio_bytes: Raw audio data
format: Audio format (default: webm)
Returns:
Transcribed text string
"""
if not audio_bytes:
return ""
try:
# Convert audio to WAV format that Whisper expects
audio = AudioSegment.from_file(
io.BytesIO(audio_bytes),
format=format
)
# Convert to 16kHz mono WAV (Whisper's expected format)
audio = audio.set_frame_rate(16000).set_channels(1)
# Export to temporary file (faster-whisper needs a file path)
with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as tmp:
audio.export(tmp.name, format='wav')
tmp_path = tmp.name
try:
# Transcribe
model = get_model()
segments, info = model.transcribe(
tmp_path,
beam_size=5,
vad_filter=True,
vad_parameters=dict(
min_silence_duration_ms=500
)
)
# Combine all segments into text
text = ' '.join(segment.text.strip() for segment in segments)
return text.strip()
finally:
# Clean up temp file
if os.path.exists(tmp_path):
os.unlink(tmp_path)
except Exception as e:
logger.error(f"Transcription error: {e}")
return ""
def preload_model():
"""Preload the model during startup."""
try:
get_model()
return True
except Exception as e:
logger.error(f"Failed to preload model: {e}")
return False