Initial commit: Live Captions web application

Real-time speech-to-text using OpenAI Whisper (faster-whisper). Features browser audio capture, WebSocket streaming, and customizable display settings. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 08:53:40 -07:00 · 2026-01-12 08:53:40 -07:00 · c7becf330c
commit c7becf330c
18 changed files with 2633 additions and 0 deletions
--- a/.env.example
+++ b/.env.example
@ -0,0 +1,33 @@
+# Server settings
+HOST=0.0.0.0
+PORT=5000
+DEBUG=false
+
+# Whisper settings
+WHISPER_MODEL=base
+# Device: cpu or cuda (for NVIDIA GPU)
+WHISPER_DEVICE=cpu
+# Compute type:
+#   CPU: int8 (fastest), float32
+#   GPU: float16 (recommended), int8_float16, float32
+WHISPER_COMPUTE_TYPE=int8
+
+# Audio settings
+AUDIO_CHUNK_DURATION=3
+AUDIO_SAMPLE_RATE=16000
+
+# Database
+DATABASE_PATH=data/settings.db
+
+# =============================================================================
+# GPU Configuration (optional)
+# =============================================================================
+# To enable NVIDIA GPU support:
+# 1. Install NVIDIA Container Toolkit (see CLAUDE.md for instructions)
+# 2. Set WHISPER_DEVICE=cuda
+# 3. Set WHISPER_COMPUTE_TYPE=float16 (recommended for GPU)
+# 4. Run with: docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build
+#
+# Example GPU settings:
+# WHISPER_DEVICE=cuda
+# WHISPER_COMPUTE_TYPE=float16
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,28 @@
+# Environment
+.env
+
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+venv/
+ENV/
+
+# Data
+data/
+recordings/
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Whisper models cache (if running locally)
+.cache/
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,154 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+Live Captions is a Dockerized web application that provides real-time speech-to-text captions using OpenAI's Whisper model (via faster-whisper). It captures microphone audio in the browser, streams it to a Flask backend for transcription, and displays captions with customizable styling.
+
+## Commands
+
+### Development
+```bash
+# Build and run (primary development command)
+docker compose up --build
+
+# Run in background
+docker compose up -d --build
+
+# View logs
+docker compose logs -f
+
+# Stop
+docker compose down
+
+# Reset all data (database + cached models)
+docker compose down -v
+```
+
+### First-time setup
+```bash
+cp .env.example .env
+docker compose up --build
+```
+
+## Architecture
+
+```
+Browser                          Docker Container
+┌─────────────────────┐         ┌─────────────────────────────┐
+│  MediaRecorder API  │         │  Flask + Flask-SocketIO     │
+│  (1.5s audio chunks)│ ──────► │         (app.py)            │
+│                     │ WebSocket│            │                │
+│  Caption Display    │ ◄────── │  faster-whisper transcriber │
+│  (word-by-word)     │         │      (transcriber.py)       │
+│                     │         │            │                │
+│  Settings Panel     │ ──────► │  SQLite settings persistence│
+│                     │ REST API│      (database.py)          │
+└─────────────────────┘         └─────────────────────────────┘
+```
+
+### Data Flow
+1. Browser captures mic audio using MediaRecorder, sends base64-encoded WebM chunks every 1.5s via WebSocket
+2. Backend converts WebM→WAV using pydub/ffmpeg, transcribes with faster-whisper
+3. Transcribed text sent back via WebSocket `transcription` event
+4. Frontend animates words appearing one-by-one for streaming effect
+
+### Key Files
+- **app.py**: Flask server with SocketIO WebSocket handlers and REST API for settings
+- **transcriber.py**: Whisper model loading and audio transcription (singleton model instance)
+- **database.py**: SQLite CRUD for user display preferences
+- **static/js/app.js**: Audio capture, WebSocket client, word animation queue
+- **static/js/settings.js**: Settings panel UI and persistence
+
+## Configuration
+
+Environment variables in `.env`:
+- `WHISPER_MODEL`: Model size (tiny/base/small/medium/large) - affects accuracy vs speed
+- `WHISPER_DEVICE`: cpu or cuda
+- `WHISPER_COMPUTE_TYPE`: int8/float16/float32
+
+User display settings stored in SQLite (`data/settings.db`):
+- Font family, size, weight, color
+- Background color, opacity, border radius, padding
+- Max words (controls caption buffer length)
+
+## API Endpoints
+
+| Endpoint | Method | Purpose |
+|----------|--------|---------|
+| `/` | GET | Main UI |
+| `/api/health` | GET | Health check |
+| `/api/settings` | GET/PUT | Read/update user settings |
+| `/api/settings/reset` | POST | Reset to defaults |
+
+## WebSocket Events
+
+| Event | Direction | Payload |
+|-------|-----------|---------|
+| `audio_data` | client→server | `{audio: base64, format: 'webm'}` |
+| `transcription` | server→client | `{text: string}` |
+| `settings_updated` | server→client | settings object |
+
+## Volumes
+
+- `./data:/app/data` - SQLite database persistence
+- `whisper-models` - Cached Whisper model files (~140MB for base)
+
+## NVIDIA GPU Support
+
+GPU acceleration significantly improves transcription speed. Follow these steps to enable it.
+
+### Prerequisites
+
+1. NVIDIA GPU with CUDA support
+2. NVIDIA driver installed (`nvidia-smi` should work)
+3. Docker installed
+
+### Install NVIDIA Container Toolkit
+
+```bash
+# Add NVIDIA package repository
+curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
+curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
+  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
+  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
+
+# Install the toolkit
+sudo apt-get update
+sudo apt-get install -y nvidia-container-toolkit
+
+# Configure Docker to use NVIDIA runtime
+sudo nvidia-ctk runtime configure --runtime=docker
+sudo systemctl restart docker
+
+# Verify installation
+docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
+```
+
+### Configure for GPU
+
+1. Update `.env`:
+```env
+WHISPER_DEVICE=cuda
+WHISPER_COMPUTE_TYPE=float16
+```
+
+2. Run with GPU support:
+```bash
+docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build
+```
+
+### GPU Compute Types
+
+| Type | Speed | Memory | Notes |
+|------|-------|--------|-------|
+| `float16` | Fast | Medium | Recommended for most GPUs |
+| `int8_float16` | Faster | Lower | Good balance |
+| `float32` | Slower | Higher | Maximum precision |
+
+### Troubleshooting
+
+- **"could not select device driver"**: NVIDIA Container Toolkit not installed or Docker not restarted
+- **CUDA out of memory**: Try a smaller model (`WHISPER_MODEL=small` or `tiny`)
+- **Verify GPU access**: `docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi`
--- a/45
+++ b/45
@ -0,0 +1,45 @@
+FROM python:3.11-slim
+
+# Set environment variables
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+
+# Install system dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ffmpeg \
+    libsndfile1 \
+    && rm -rf /var/lib/apt/lists/*
+
+# Create app directory
+WORKDIR /app
+
+# Create non-root user
+RUN useradd -m -u 1000 appuser
+
+# Copy requirements first for better caching
+COPY requirements.txt .
+
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy application code
+COPY . .
+
+# Create data and recordings directories
+RUN mkdir -p /app/data /app/recordings && chown -R appuser:appuser /app
+
+# Create directory for Whisper models cache
+RUN mkdir -p /home/appuser/.cache/huggingface && chown -R appuser:appuser /home/appuser
+
+# Switch to non-root user
+USER appuser
+
+# Expose port
+EXPOSE 5000
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:5000/api/health')" || exit 1
+
+# Run the application
+CMD ["python", "app.py"]
--- a/Dockerfile.gpu
+++ b/Dockerfile.gpu
@ -0,0 +1,54 @@
+# GPU-enabled Dockerfile for NVIDIA CUDA support
+FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
+
+# Set environment variables
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+ENV DEBIAN_FRONTEND=noninteractive
+
+# Install Python and system dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    python3.11 \
+    python3.11-venv \
+    python3-pip \
+    ffmpeg \
+    libsndfile1 \
+    && rm -rf /var/lib/apt/lists/*
+
+# Set Python 3.11 as default
+RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1 \
+    && update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1
+
+# Create app directory
+WORKDIR /app
+
+# Create non-root user
+RUN useradd -m -u 1000 appuser
+
+# Copy requirements first for better caching
+COPY requirements.txt .
+
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy application code
+COPY . .
+
+# Create data and recordings directories
+RUN mkdir -p /app/data /app/recordings && chown -R appuser:appuser /app
+
+# Create directory for Whisper models cache
+RUN mkdir -p /home/appuser/.cache/huggingface && chown -R appuser:appuser /home/appuser
+
+# Switch to non-root user
+USER appuser
+
+# Expose port
+EXPOSE 5000
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:5000/api/health')" || exit 1
+
+# Run the application
+CMD ["python", "app.py"]
--- a/README.MD
+++ b/README.MD
@ -0,0 +1,3 @@
+Live Captions 
+
+Live captions is a project to display live captions on screen in a small customizable browser window entirely locally. 
--- a/app.py
+++ b/app.py
@ -0,0 +1,235 @@
+"""
+Live Captions - Flask Application
+
+A web-based live captioning application using Whisper for speech recognition.
+"""
+
+import os
+import logging
+from datetime import datetime
+
+from flask import Flask, render_template, jsonify, request
+from flask_socketio import SocketIO, emit
+from dotenv import load_dotenv
+
+import database
+import transcriber
+import recordings
+
+# Load environment variables
+load_dotenv()
+
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+
+# Initialize Flask app
+app = Flask(__name__)
+app.config['SECRET_KEY'] = os.environ.get('SECRET_KEY', 'live-captions-secret')
+
+# Initialize SocketIO with gevent
+socketio = SocketIO(
+    app,
+    cors_allowed_origins="*",
+    async_mode='gevent'
+)
+
+
+# =============================================================================
+# Routes
+# =============================================================================
+
+@app.route('/')
+def index():
+    """Serve the main page."""
+    return render_template('index.html')
+
+
+@app.route('/api/health')
+def health():
+    """Health check endpoint."""
+    return jsonify({'status': 'healthy'})
+
+
+@app.route('/api/settings', methods=['GET'])
+def get_settings():
+    """Get current user settings."""
+    settings = database.get_settings()
+    return jsonify(settings)
+
+
+@app.route('/api/settings', methods=['PUT'])
+def update_settings():
+    """Update user settings."""
+    data = request.get_json()
+    if not data:
+        return jsonify({'error': 'No data provided'}), 400
+
+    settings = database.update_settings(data)
+
+    # Broadcast settings update to all clients
+    socketio.emit('settings_updated', settings)
+
+    return jsonify(settings)
+
+
+@app.route('/api/settings/reset', methods=['POST'])
+def reset_settings():
+    """Reset settings to defaults."""
+    settings = database.reset_settings()
+
+    # Broadcast settings update to all clients
+    socketio.emit('settings_updated', settings)
+
+    return jsonify(settings)
+
+
+@app.route('/api/recordings', methods=['GET'])
+def list_recordings():
+    """List all saved recordings."""
+    return jsonify(recordings.list_recordings())
+
+
+@app.route('/api/recordings/<filename>', methods=['GET'])
+def get_recording(filename):
+    """Get a specific recording's content."""
+    recording = recordings.get_recording(filename)
+    if recording:
+        return jsonify(recording)
+    return jsonify({'error': 'Recording not found'}), 404
+
+
+@app.route('/api/recordings/<filename>', methods=['DELETE'])
+def delete_recording(filename):
+    """Delete a specific recording."""
+    if recordings.delete_recording(filename):
+        return jsonify({'success': True})
+    return jsonify({'error': 'Failed to delete recording'}), 400
+
+
+# =============================================================================
+# WebSocket Events
+# =============================================================================
+
+@socketio.on('connect')
+def handle_connect():
+    """Handle client connection."""
+    logger.info(f"Client connected: {request.sid}")
+    # Send current settings to the newly connected client
+    settings = database.get_settings()
+    emit('settings_updated', settings)
+
+
+@socketio.on('disconnect')
+def handle_disconnect():
+    """Handle client disconnection."""
+    logger.info(f"Client disconnected: {request.sid}")
+
+
+@socketio.on('audio_data')
+def handle_audio_data(data):
+    """
+    Handle incoming audio data from client.
+
+    Args:
+        data: Dictionary containing 'audio' (base64 or bytes) and 'format'
+    """
+    try:
+        audio_bytes = data.get('audio')
+        audio_format = data.get('format', 'webm')
+
+        if not audio_bytes:
+            return
+
+        # Handle base64 encoded audio
+        if isinstance(audio_bytes, str):
+            import base64
+            audio_bytes = base64.b64decode(audio_bytes)
+
+        # Transcribe audio
+        text = transcriber.transcribe_audio(audio_bytes, format=audio_format)
+
+        if text:
+            logger.info(f"Transcription: {text}")
+            emit('transcription', {'text': text})
+
+    except Exception as e:
+        logger.error(f"Error processing audio: {e}")
+        emit('error', {'message': 'Failed to process audio'})
+
+
+@socketio.on('save_recording')
+def handle_save_recording(data):
+    """Handle saving a recording session."""
+    client_id = request.sid
+
+    try:
+        # Parse timestamps from client
+        start_time_str = data.get('startTime')
+        end_time_str = data.get('endTime')
+
+        if start_time_str:
+            start_time = datetime.fromisoformat(start_time_str.replace('Z', '+00:00'))
+        else:
+            start_time = datetime.now()
+
+        if end_time_str:
+            end_time = datetime.fromisoformat(end_time_str.replace('Z', '+00:00'))
+        else:
+            end_time = datetime.now()
+
+        transcript = data.get('transcript', '')
+        word_count = data.get('wordCount', 0)
+
+        # Save the recording
+        filename = recordings.save_recording(
+            start_time=start_time,
+            end_time=end_time,
+            transcript=transcript,
+            word_count=word_count,
+            client_id=client_id
+        )
+
+        if filename:
+            logger.info(f"Recording saved: {filename}")
+            emit('recording_saved', {'filename': filename})
+        else:
+            emit('recording_error', {'message': 'Failed to save recording'})
+
+    except Exception as e:
+        logger.error(f"Error saving recording: {e}")
+        emit('recording_error', {'message': str(e)})
+
+
+# =============================================================================
+# Startup
+# =============================================================================
+
+def initialize():
+    """Initialize application components."""
+    logger.info("Initializing Live Captions...")
+
+    # Initialize database
+    database.init_db()
+    logger.info("Database initialized")
+
+    # Preload Whisper model
+    logger.info("Preloading Whisper model (this may take a moment)...")
+    if transcriber.preload_model():
+        logger.info("Whisper model ready")
+    else:
+        logger.warning("Failed to preload Whisper model")
+
+
+if __name__ == '__main__':
+    initialize()
+
+    host = os.environ.get('HOST', '0.0.0.0')
+    port = int(os.environ.get('PORT', 5000))
+    debug = os.environ.get('DEBUG', 'false').lower() == 'true'
+
+    logger.info(f"Starting Live Captions on {host}:{port}")
+    socketio.run(app, host=host, port=port, debug=debug)
--- a/database.py
+++ b/database.py
@ -0,0 +1,168 @@
+"""
+SQLite database module for user settings persistence.
+"""
+
+import sqlite3
+import os
+from datetime import datetime
+
+# Default settings
+DEFAULT_SETTINGS = {
+    'font_family': 'Arial, sans-serif',
+    'font_size': 32,
+    'font_weight': 'normal',
+    'text_color': '#ffffff',
+    'background_color': '#1a1a2e',
+    'background_opacity': 0.9,
+    'max_words': 30,
+    'text_align': 'center',
+    'padding': 20,
+    'border_radius': 10,
+}
+
+
+def get_db_path():
+    """Get database path from environment or use default."""
+    return os.environ.get('DATABASE_PATH', 'data/settings.db')
+
+
+def get_connection():
+    """Create a database connection."""
+    db_path = get_db_path()
+
+    # Ensure directory exists
+    os.makedirs(os.path.dirname(db_path), exist_ok=True)
+
+    conn = sqlite3.connect(db_path)
+    conn.row_factory = sqlite3.Row
+    return conn
+
+
+def init_db():
+    """Initialize the database with the settings table."""
+    conn = get_connection()
+    cursor = conn.cursor()
+
+    # Check if table exists
+    cursor.execute("SELECT name FROM sqlite_master WHERE type='table' AND name='user_settings'")
+    table_exists = cursor.fetchone() is not None
+
+    if table_exists:
+        # Check if we need to migrate from max_lines to max_words
+        cursor.execute("PRAGMA table_info(user_settings)")
+        columns = [col[1] for col in cursor.fetchall()]
+
+        if 'max_lines' in columns and 'max_words' not in columns:
+            # Add max_words column
+            cursor.execute('ALTER TABLE user_settings ADD COLUMN max_words INTEGER DEFAULT 30')
+            conn.commit()
+
+        # Remove old columns that are no longer needed (fade_delay, max_lines)
+        # SQLite doesn't support DROP COLUMN easily, so we just ignore old columns
+    else:
+        # Create settings table
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS user_settings (
+                id INTEGER PRIMARY KEY DEFAULT 1,
+                font_family TEXT DEFAULT 'Arial, sans-serif',
+                font_size INTEGER DEFAULT 32,
+                font_weight TEXT DEFAULT 'normal',
+                text_color TEXT DEFAULT '#ffffff',
+                background_color TEXT DEFAULT '#1a1a2e',
+                background_opacity REAL DEFAULT 0.9,
+                max_words INTEGER DEFAULT 30,
+                text_align TEXT DEFAULT 'center',
+                padding INTEGER DEFAULT 20,
+                border_radius INTEGER DEFAULT 10,
+                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+            )
+        ''')
+
+    # Insert default settings if table is empty
+    cursor.execute('SELECT COUNT(*) FROM user_settings')
+    if cursor.fetchone()[0] == 0:
+        columns = ', '.join(DEFAULT_SETTINGS.keys())
+        placeholders = ', '.join(['?' for _ in DEFAULT_SETTINGS])
+        cursor.execute(
+            f'INSERT INTO user_settings ({columns}) VALUES ({placeholders})',
+            list(DEFAULT_SETTINGS.values())
+        )
+
+    conn.commit()
+    conn.close()
+
+
+def get_settings():
+    """Fetch current user settings."""
+    conn = get_connection()
+    cursor = conn.cursor()
+
+    cursor.execute('SELECT * FROM user_settings WHERE id = 1')
+    row = cursor.fetchone()
+    conn.close()
+
+    if row:
+        # Convert to dict and exclude id and timestamps
+        settings = dict(row)
+        for key in ['id', 'created_at', 'updated_at', 'max_lines', 'fade_delay']:
+            settings.pop(key, None)
+
+        # Ensure max_words exists (for migration)
+        if 'max_words' not in settings:
+            settings['max_words'] = DEFAULT_SETTINGS['max_words']
+
+        return settings
+
+    return DEFAULT_SETTINGS.copy()
+
+
+def update_settings(settings_dict):
+    """Update user settings with provided values."""
+    if not settings_dict:
+        return get_settings()
+
+    conn = get_connection()
+    cursor = conn.cursor()
+
+    # Build UPDATE query with only valid columns
+    valid_columns = set(DEFAULT_SETTINGS.keys())
+    updates = []
+    values = []
+
+    for key, value in settings_dict.items():
+        if key in valid_columns:
+            updates.append(f'{key} = ?')
+            values.append(value)
+
+    if updates:
+        updates.append('updated_at = ?')
+        values.append(datetime.now().isoformat())
+
+        query = f'UPDATE user_settings SET {", ".join(updates)} WHERE id = 1'
+        cursor.execute(query, values)
+        conn.commit()
+
+    conn.close()
+    return get_settings()
+
+
+def reset_settings():
+    """Reset all settings to defaults."""
+    conn = get_connection()
+    cursor = conn.cursor()
+
+    # Delete existing and insert defaults
+    cursor.execute('DELETE FROM user_settings')
+
+    columns = ', '.join(DEFAULT_SETTINGS.keys())
+    placeholders = ', '.join(['?' for _ in DEFAULT_SETTINGS])
+    cursor.execute(
+        f'INSERT INTO user_settings ({columns}) VALUES ({placeholders})',
+        list(DEFAULT_SETTINGS.values())
+    )
+
+    conn.commit()
+    conn.close()
+
+    return DEFAULT_SETTINGS.copy()
--- a/docker-compose.gpu.yml
+++ b/docker-compose.gpu.yml
@ -0,0 +1,21 @@
+# GPU override for docker-compose
+# Usage: docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build
+#
+# Prerequisites:
+# 1. NVIDIA GPU with driver installed
+# 2. NVIDIA Container Toolkit installed
+# 3. Set WHISPER_DEVICE=cuda in .env
+# 4. Set WHISPER_COMPUTE_TYPE=float16 in .env (recommended)
+
+services:
+  live-captions:
+    build:
+      context: .
+      dockerfile: Dockerfile.gpu
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -0,0 +1,29 @@
+services:
+  live-captions:
+    build: .
+    container_name: live-captions
+    ports:
+      - "${PORT:-5000}:5000"
+    volumes:
+      # Persist SQLite database
+      - ./data:/app/data
+      # Persist Whisper models
+      - whisper-models:/home/appuser/.cache/huggingface
+      # Persist recordings
+      - ./recordings:/app/recordings
+    env_file:
+      - .env
+    environment:
+      - HOST=0.0.0.0
+      - PORT=5000
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:5000/api/health')"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s
+
+volumes:
+  whisper-models:
+    name: live-captions-whisper-models
--- a/recordings.py
+++ b/recordings.py
@ -0,0 +1,208 @@
+"""
+Recording session management and file saving.
+"""
+
+import os
+import logging
+from datetime import datetime
+from typing import Optional
+
+logger = logging.getLogger(__name__)
+
+# Default recordings directory
+RECORDINGS_DIR = os.environ.get('RECORDINGS_PATH', '/app/recordings')
+
+
+def ensure_recordings_dir():
+    """Ensure the recordings directory exists."""
+    os.makedirs(RECORDINGS_DIR, exist_ok=True)
+    return RECORDINGS_DIR
+
+
+def generate_filename(start_time: datetime) -> str:
+    """
+    Generate a filename from the session start time.
+    Format: YYYY-MM-DD_HH-MM-SS_captions.md
+    """
+    return start_time.strftime('%Y-%m-%d_%H-%M-%S_captions.md')
+
+
+def calculate_duration(start_time: datetime, end_time: datetime) -> str:
+    """Calculate and format duration as HH:MM:SS."""
+    delta = end_time - start_time
+    total_seconds = int(delta.total_seconds())
+    hours, remainder = divmod(total_seconds, 3600)
+    minutes, seconds = divmod(remainder, 60)
+    return f"{hours:02d}:{minutes:02d}:{seconds:02d}"
+
+
+def get_whisper_model_name() -> str:
+    """Get the configured Whisper model name."""
+    return os.environ.get('WHISPER_MODEL', 'base')
+
+
+def save_recording(
+    start_time: datetime,
+    end_time: datetime,
+    transcript: str,
+    word_count: int,
+    client_id: str
+) -> Optional[str]:
+    """
+    Save a recording session to a markdown file.
+
+    Args:
+        start_time: Session start datetime
+        end_time: Session end datetime
+        transcript: Full transcript text
+        word_count: Number of words in transcript
+        client_id: WebSocket client session ID
+
+    Returns:
+        Filename if successful, None if failed
+    """
+    try:
+        ensure_recordings_dir()
+
+        filename = generate_filename(start_time)
+        filepath = os.path.join(RECORDINGS_DIR, filename)
+
+        duration = calculate_duration(start_time, end_time)
+        model_name = get_whisper_model_name()
+
+        # Build markdown content with frontmatter
+        content = f"""---
+session_start: {start_time.isoformat()}
+session_end: {end_time.isoformat()}
+duration: {duration}
+whisper_model: {model_name}
+word_count: {word_count}
+---
+
+# Live Captions Recording
+
+**Session Start:** {start_time.strftime('%Y-%m-%d %H:%M:%S')}
+**Session End:** {end_time.strftime('%Y-%m-%d %H:%M:%S')}
+**Duration:** {duration}
+**Model:** {model_name}
+**Words:** {word_count}
+
+---
+
+## Transcript
+
+{transcript}
+"""
+
+        with open(filepath, 'w', encoding='utf-8') as f:
+            f.write(content)
+
+        logger.info(f"Recording saved: {filename} ({word_count} words)")
+        return filename
+
+    except Exception as e:
+        logger.error(f"Failed to save recording: {e}")
+        return None
+
+
+def list_recordings() -> list:
+    """
+    List all recording files, sorted by date descending.
+
+    Returns:
+        List of recording metadata dicts
+    """
+    ensure_recordings_dir()
+    recordings = []
+
+    try:
+        for filename in os.listdir(RECORDINGS_DIR):
+            if filename.endswith('_captions.md'):
+                filepath = os.path.join(RECORDINGS_DIR, filename)
+                stat = os.stat(filepath)
+
+                # Parse date from filename (YYYY-MM-DD_HH-MM-SS_captions.md)
+                try:
+                    date_str = filename.replace('_captions.md', '')
+                    date_parts = date_str.split('_')
+                    display_date = f"{date_parts[0]} {date_parts[1].replace('-', ':')}"
+                except (IndexError, ValueError):
+                    display_date = filename
+
+                recordings.append({
+                    'filename': filename,
+                    'date': display_date,
+                    'size': stat.st_size,
+                    'created': datetime.fromtimestamp(stat.st_mtime).isoformat()
+                })
+
+        # Sort by filename descending (newest first)
+        recordings.sort(key=lambda x: x['filename'], reverse=True)
+
+    except Exception as e:
+        logger.error(f"Failed to list recordings: {e}")
+
+    return recordings
+
+
+def get_recording(filename: str) -> Optional[dict]:
+    """
+    Get a specific recording's content.
+
+    Args:
+        filename: The recording filename
+
+    Returns:
+        Dict with filename and content, or None if not found
+    """
+    ensure_recordings_dir()
+
+    # Sanitize filename to prevent path traversal
+    safe_filename = os.path.basename(filename)
+    if not safe_filename.endswith('_captions.md'):
+        return None
+
+    filepath = os.path.join(RECORDINGS_DIR, safe_filename)
+
+    try:
+        if os.path.exists(filepath):
+            with open(filepath, 'r', encoding='utf-8') as f:
+                content = f.read()
+            return {
+                'filename': safe_filename,
+                'content': content
+            }
+    except Exception as e:
+        logger.error(f"Failed to read recording {safe_filename}: {e}")
+
+    return None
+
+
+def delete_recording(filename: str) -> bool:
+    """
+    Delete a specific recording.
+
+    Args:
+        filename: The recording filename
+
+    Returns:
+        True if deleted, False otherwise
+    """
+    ensure_recordings_dir()
+
+    # Sanitize filename to prevent path traversal
+    safe_filename = os.path.basename(filename)
+    if not safe_filename.endswith('_captions.md'):
+        return False
+
+    filepath = os.path.join(RECORDINGS_DIR, safe_filename)
+
+    try:
+        if os.path.exists(filepath):
+            os.remove(filepath)
+            logger.info(f"Recording deleted: {safe_filename}")
+            return True
+    except Exception as e:
+        logger.error(f"Failed to delete recording {safe_filename}: {e}")
+
+    return False
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,9 @@
+flask>=3.0.0
+flask-socketio>=5.3.0
+faster-whisper>=1.0.0
+pydub>=0.25.1
+python-dotenv>=1.0.0
+python-engineio>=4.8.0
+python-socketio>=5.10.0
+gevent>=24.2.1
+gevent-websocket>=0.10.1
--- a/static/css/style.css
+++ b/static/css/style.css
@ -0,0 +1,567 @@
+/**
+ * Live Captions - Stylesheet
+ */
+
+/* Reset and Base */
+* {
+    margin: 0;
+    padding: 0;
+    box-sizing: border-box;
+}
+
+:root {
+    --bg-primary: #0d0d1a;
+    --bg-secondary: #1a1a2e;
+    --bg-tertiary: #252542;
+    --text-primary: #ffffff;
+    --text-secondary: #a0a0b0;
+    --accent: #4a9eff;
+    --accent-hover: #6ab0ff;
+    --danger: #ff4a6a;
+    --danger-hover: #ff6a85;
+    --success: #4aff8a;
+    --warning: #ffa64a;
+    --border-radius: 8px;
+    --transition: 0.2s ease;
+}
+
+html, body {
+    height: 100%;
+    font-family: 'Segoe UI', system-ui, -apple-system, sans-serif;
+    background-color: var(--bg-primary);
+    color: var(--text-primary);
+}
+
+#app {
+    display: flex;
+    flex-direction: column;
+    height: 100vh;
+    padding: 20px;
+}
+
+/* Caption Container */
+#caption-container {
+    flex: 1;
+    display: flex;
+    flex-direction: column;
+    justify-content: center;
+    background-color: rgba(26, 26, 46, 0.9);
+    border-radius: 10px;
+    padding: 20px;
+    margin-bottom: 20px;
+    overflow: hidden;
+    font-size: 32px;
+    font-family: Arial, sans-serif;
+    text-align: center;
+}
+
+#captions {
+    line-height: 1.4;
+    word-wrap: break-word;
+    overflow-wrap: break-word;
+}
+
+/* Controls Bar */
+#controls {
+    display: flex;
+    gap: 10px;
+    justify-content: center;
+    align-items: center;
+    padding: 15px;
+    background-color: var(--bg-secondary);
+    border-radius: var(--border-radius);
+}
+
+/* Buttons */
+.btn {
+    display: inline-flex;
+    align-items: center;
+    gap: 8px;
+    padding: 12px 24px;
+    border: none;
+    border-radius: var(--border-radius);
+    font-size: 16px;
+    font-weight: 500;
+    cursor: pointer;
+    transition: background-color var(--transition), transform var(--transition);
+}
+
+.btn:hover:not(:disabled) {
+    transform: translateY(-1px);
+}
+
+.btn:disabled {
+    opacity: 0.5;
+    cursor: not-allowed;
+}
+
+.btn-primary {
+    background-color: var(--accent);
+    color: white;
+}
+
+.btn-primary:hover:not(:disabled) {
+    background-color: var(--accent-hover);
+}
+
+.btn-danger {
+    background-color: var(--danger);
+    color: white;
+}
+
+.btn-danger:hover:not(:disabled) {
+    background-color: var(--danger-hover);
+}
+
+.btn-secondary {
+    background-color: var(--bg-tertiary);
+    color: var(--text-primary);
+}
+
+.btn-secondary:hover:not(:disabled) {
+    background-color: #323258;
+}
+
+.btn-success {
+    background-color: var(--success);
+    color: #000;
+}
+
+.btn-success:hover:not(:disabled) {
+    background-color: #5aff9a;
+}
+
+/* Toggle Switch */
+.toggle-switch {
+    display: flex;
+    align-items: center;
+    gap: 10px;
+    cursor: pointer;
+    user-select: none;
+    padding: 8px 12px;
+    background-color: var(--bg-tertiary);
+    border-radius: var(--border-radius);
+}
+
+.toggle-switch input {
+    display: none;
+}
+
+.toggle-slider {
+    position: relative;
+    width: 44px;
+    height: 24px;
+    background-color: var(--text-secondary);
+    border-radius: 12px;
+    transition: background-color var(--transition);
+}
+
+.toggle-slider::before {
+    content: '';
+    position: absolute;
+    top: 3px;
+    left: 3px;
+    width: 18px;
+    height: 18px;
+    background-color: white;
+    border-radius: 50%;
+    transition: transform var(--transition);
+}
+
+.toggle-switch input:checked + .toggle-slider {
+    background-color: var(--success);
+}
+
+.toggle-switch input:checked + .toggle-slider::before {
+    transform: translateX(20px);
+}
+
+.toggle-label {
+    font-size: 14px;
+    font-weight: 500;
+    color: var(--text-primary);
+}
+
+.btn-icon {
+    width: 48px;
+    height: 48px;
+    padding: 0;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    background-color: var(--bg-tertiary);
+    color: var(--text-primary);
+    font-size: 20px;
+}
+
+.btn-icon:hover:not(:disabled) {
+    background-color: #323258;
+}
+
+.icon {
+    font-size: 14px;
+}
+
+/* Status Indicator */
+#status {
+    display: flex;
+    align-items: center;
+    gap: 8px;
+    justify-content: center;
+    padding: 10px;
+    font-size: 14px;
+    color: var(--text-secondary);
+}
+
+.dot {
+    width: 10px;
+    height: 10px;
+    border-radius: 50%;
+    background-color: var(--text-secondary);
+}
+
+.dot.connected {
+    background-color: var(--success);
+}
+
+.dot.recording {
+    background-color: var(--danger);
+    animation: pulse 1s infinite;
+}
+
+.dot.disconnected {
+    background-color: var(--text-secondary);
+}
+
+.dot.error {
+    background-color: var(--warning);
+}
+
+@keyframes pulse {
+    0%, 100% { opacity: 1; }
+    50% { opacity: 0.5; }
+}
+
+/* Settings Panel */
+.panel {
+    position: fixed;
+    top: 0;
+    right: 0;
+    width: 350px;
+    height: 100vh;
+    background-color: var(--bg-secondary);
+    box-shadow: -5px 0 20px rgba(0, 0, 0, 0.3);
+    z-index: 1000;
+    display: flex;
+    flex-direction: column;
+    transition: transform var(--transition);
+}
+
+.panel.hidden {
+    transform: translateX(100%);
+}
+
+.panel-header {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    padding: 20px;
+    border-bottom: 1px solid var(--bg-tertiary);
+}
+
+.panel-header h2 {
+    font-size: 20px;
+    font-weight: 600;
+}
+
+.btn-close {
+    width: 36px;
+    height: 36px;
+    border: none;
+    background: var(--bg-tertiary);
+    color: var(--text-primary);
+    font-size: 24px;
+    border-radius: 50%;
+    cursor: pointer;
+    transition: background-color var(--transition);
+}
+
+.btn-close:hover {
+    background-color: var(--danger);
+}
+
+.panel-content {
+    flex: 1;
+    overflow-y: auto;
+    padding: 20px;
+}
+
+/* Settings Groups */
+.setting-group {
+    margin-bottom: 25px;
+}
+
+.setting-group h3 {
+    font-size: 14px;
+    font-weight: 600;
+    color: var(--accent);
+    text-transform: uppercase;
+    letter-spacing: 1px;
+    margin-bottom: 15px;
+}
+
+.setting-group label {
+    display: block;
+    font-size: 14px;
+    color: var(--text-secondary);
+    margin-bottom: 5px;
+    margin-top: 12px;
+}
+
+.setting-group label:first-of-type {
+    margin-top: 0;
+}
+
+/* Form Controls */
+select,
+input[type="text"] {
+    width: 100%;
+    padding: 10px 12px;
+    background-color: var(--bg-tertiary);
+    border: 1px solid transparent;
+    border-radius: var(--border-radius);
+    color: var(--text-primary);
+    font-size: 14px;
+    transition: border-color var(--transition);
+}
+
+select:focus,
+input[type="text"]:focus {
+    outline: none;
+    border-color: var(--accent);
+}
+
+input[type="range"] {
+    width: 100%;
+    height: 6px;
+    background: var(--bg-tertiary);
+    border-radius: 3px;
+    appearance: none;
+    cursor: pointer;
+}
+
+input[type="range"]::-webkit-slider-thumb {
+    appearance: none;
+    width: 18px;
+    height: 18px;
+    background: var(--accent);
+    border-radius: 50%;
+    cursor: pointer;
+    transition: background-color var(--transition);
+}
+
+input[type="range"]::-webkit-slider-thumb:hover {
+    background: var(--accent-hover);
+}
+
+input[type="range"]::-moz-range-thumb {
+    width: 18px;
+    height: 18px;
+    background: var(--accent);
+    border-radius: 50%;
+    border: none;
+    cursor: pointer;
+}
+
+input[type="color"] {
+    width: 100%;
+    height: 40px;
+    padding: 2px;
+    background-color: var(--bg-tertiary);
+    border: 1px solid transparent;
+    border-radius: var(--border-radius);
+    cursor: pointer;
+}
+
+input[type="color"]::-webkit-color-swatch-wrapper {
+    padding: 0;
+}
+
+input[type="color"]::-webkit-color-swatch {
+    border: none;
+    border-radius: calc(var(--border-radius) - 3px);
+}
+
+/* Setting Actions */
+.setting-actions {
+    display: flex;
+    flex-direction: column;
+    gap: 10px;
+    margin-top: 20px;
+    padding-top: 20px;
+    border-top: 1px solid var(--bg-tertiary);
+}
+
+.setting-actions .btn {
+    width: 100%;
+    justify-content: center;
+}
+
+/* Overlay */
+#overlay {
+    position: fixed;
+    top: 0;
+    left: 0;
+    width: 100%;
+    height: 100%;
+    background-color: rgba(0, 0, 0, 0.5);
+    z-index: 999;
+    transition: opacity var(--transition);
+}
+
+#overlay.hidden {
+    opacity: 0;
+    pointer-events: none;
+}
+
+/* Scrollbar Styling */
+::-webkit-scrollbar {
+    width: 8px;
+}
+
+::-webkit-scrollbar-track {
+    background: var(--bg-tertiary);
+    border-radius: 4px;
+}
+
+::-webkit-scrollbar-thumb {
+    background: var(--text-secondary);
+    border-radius: 4px;
+}
+
+::-webkit-scrollbar-thumb:hover {
+    background: var(--accent);
+}
+
+/* Recordings Panel */
+.recordings-list {
+    display: flex;
+    flex-direction: column;
+    gap: 8px;
+}
+
+.recordings-empty {
+    color: var(--text-secondary);
+    text-align: center;
+    padding: 40px 20px;
+}
+
+.recording-item {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    padding: 12px 15px;
+    background-color: var(--bg-tertiary);
+    border-radius: var(--border-radius);
+    cursor: pointer;
+    transition: background-color var(--transition);
+}
+
+.recording-item:hover {
+    background-color: #323258;
+}
+
+.recording-info {
+    display: flex;
+    flex-direction: column;
+    gap: 4px;
+}
+
+.recording-date {
+    font-size: 14px;
+    font-weight: 500;
+    color: var(--text-primary);
+}
+
+.recording-meta {
+    font-size: 12px;
+    color: var(--text-secondary);
+}
+
+.recording-arrow {
+    color: var(--text-secondary);
+    font-size: 18px;
+}
+
+/* Recording Viewer */
+.recording-viewer {
+    display: flex;
+    flex-direction: column;
+    height: 100%;
+}
+
+.recording-viewer.hidden {
+    display: none;
+}
+
+.viewer-header {
+    display: flex;
+    align-items: center;
+    gap: 12px;
+    margin-bottom: 15px;
+    padding-bottom: 15px;
+    border-bottom: 1px solid var(--bg-tertiary);
+}
+
+.viewer-filename {
+    font-size: 12px;
+    color: var(--text-secondary);
+    overflow: hidden;
+    text-overflow: ellipsis;
+    white-space: nowrap;
+}
+
+.viewer-content {
+    flex: 1;
+    overflow-y: auto;
+    padding: 15px;
+    background-color: var(--bg-tertiary);
+    border-radius: var(--border-radius);
+    font-size: 14px;
+    line-height: 1.6;
+    white-space: pre-wrap;
+    word-wrap: break-word;
+}
+
+.viewer-actions {
+    display: flex;
+    justify-content: flex-end;
+    margin-top: 15px;
+    padding-top: 15px;
+    border-top: 1px solid var(--bg-tertiary);
+}
+
+.btn-small {
+    padding: 8px 16px;
+    font-size: 13px;
+}
+
+/* Responsive */
+@media (max-width: 600px) {
+    #app {
+        padding: 10px;
+    }
+
+    .panel {
+        width: 100%;
+    }
+
+    #controls {
+        flex-wrap: wrap;
+    }
+
+    .btn {
+        padding: 10px 16px;
+        font-size: 14px;
+    }
+}
--- a/static/js/app.js
+++ b/static/js/app.js
@ -0,0 +1,355 @@
+/**
+ * Live Captions - Main Application
+ * Handles audio capture and WebSocket communication
+ */
+
+const App = {
+    // WebSocket connection
+    socket: null,
+
+    // Audio recording
+    mediaRecorder: null,
+    audioStream: null,
+    audioChunks: [],
+    isRecording: false,
+    recordingInterval: null,
+
+    // Continuous caption stream
+    wordBuffer: [],
+    pendingWords: [],
+    wordAnimationTimer: null,
+
+    // Auto-save recording state
+    sessionStartTime: null,
+    sessionTranscript: [],
+
+    // DOM elements
+    elements: {},
+
+    /**
+     * Initialize the application
+     */
+    init() {
+        this.cacheElements();
+        this.bindEvents();
+        this.connectSocket();
+
+        // Initialize settings module
+        Settings.init();
+    },
+
+    /**
+     * Cache DOM element references
+     */
+    cacheElements() {
+        this.elements = {
+            btnStart: document.getElementById('btn-start'),
+            btnStop: document.getElementById('btn-stop'),
+            btnClear: document.getElementById('btn-clear'),
+            autoSaveToggle: document.getElementById('auto-save-toggle'),
+            captions: document.getElementById('captions'),
+            statusDot: document.getElementById('status-dot'),
+            statusText: document.getElementById('status-text'),
+        };
+    },
+
+    /**
+     * Bind event listeners
+     */
+    bindEvents() {
+        this.elements.btnStart.addEventListener('click', () => this.startRecording());
+        this.elements.btnStop.addEventListener('click', () => this.stopRecording());
+        this.elements.btnClear.addEventListener('click', () => this.clearCaptions());
+
+        // Load auto-save preference from localStorage
+        const savedPref = localStorage.getItem('autoSaveEnabled');
+        if (savedPref === 'true') {
+            this.elements.autoSaveToggle.checked = true;
+        }
+
+        // Save preference when toggled
+        this.elements.autoSaveToggle.addEventListener('change', (e) => {
+            localStorage.setItem('autoSaveEnabled', e.target.checked);
+        });
+    },
+
+    /**
+     * Connect to WebSocket server
+     */
+    connectSocket() {
+        this.socket = io();
+
+        this.socket.on('connect', () => {
+            console.log('Connected to server');
+            this.setStatus('connected', 'Connected');
+        });
+
+        this.socket.on('disconnect', () => {
+            console.log('Disconnected from server');
+            this.setStatus('disconnected', 'Disconnected');
+        });
+
+        this.socket.on('transcription', (data) => {
+            this.addWords(data.text);
+        });
+
+        this.socket.on('settings_updated', (settings) => {
+            Settings.applySettings(settings);
+        });
+
+        this.socket.on('error', (data) => {
+            console.error('Server error:', data.message);
+        });
+
+        this.socket.on('recording_saved', (data) => {
+            console.log('Recording saved:', data.filename);
+        });
+
+        this.socket.on('recording_error', (data) => {
+            console.error('Recording error:', data.message);
+        });
+    },
+
+    /**
+     * Update status indicator
+     */
+    setStatus(state, text) {
+        this.elements.statusDot.className = `dot ${state}`;
+        this.elements.statusText.textContent = text;
+    },
+
+    /**
+     * Start audio recording
+     */
+    async startRecording() {
+        try {
+            this.audioStream = await navigator.mediaDevices.getUserMedia({
+                audio: {
+                    echoCancellation: true,
+                    noiseSuppression: true,
+                    sampleRate: 16000,
+                }
+            });
+
+            this.isRecording = true;
+            this.elements.btnStart.disabled = true;
+            this.elements.btnStop.disabled = false;
+            this.setStatus('recording', 'Recording...');
+
+            // Reset session transcript for auto-save
+            this.sessionStartTime = new Date();
+            this.sessionTranscript = [];
+
+            // Start the recording cycle
+            this.startRecordingCycle();
+
+        } catch (error) {
+            console.error('Error starting recording:', error);
+            this.setStatus('error', 'Microphone access denied');
+        }
+    },
+
+    /**
+     * Start a recording cycle - record for a duration, then send and restart
+     */
+    startRecordingCycle() {
+        if (!this.isRecording || !this.audioStream) return;
+
+        // Determine best supported MIME type
+        let mimeType = 'audio/webm';
+        if (MediaRecorder.isTypeSupported('audio/webm;codecs=opus')) {
+            mimeType = 'audio/webm;codecs=opus';
+        }
+
+        this.audioChunks = [];
+        this.mediaRecorder = new MediaRecorder(this.audioStream, { mimeType });
+
+        this.mediaRecorder.ondataavailable = (event) => {
+            if (event.data.size > 0) {
+                this.audioChunks.push(event.data);
+            }
+        };
+
+        this.mediaRecorder.onstop = () => {
+            // Create a complete blob from all chunks
+            if (this.audioChunks.length > 0) {
+                const blob = new Blob(this.audioChunks, { type: 'audio/webm' });
+                this.sendAudioBlob(blob);
+            }
+
+            // Start next cycle if still recording
+            if (this.isRecording) {
+                this.startRecordingCycle();
+            }
+        };
+
+        // Start recording
+        this.mediaRecorder.start();
+
+        // Stop after the configured duration to get a complete blob
+        // Using 1.5 seconds for more responsive streaming
+        const chunkDuration = 1500;
+        this.recordingInterval = setTimeout(() => {
+            if (this.mediaRecorder && this.mediaRecorder.state === 'recording') {
+                this.mediaRecorder.stop();
+            }
+        }, chunkDuration);
+    },
+
+    /**
+     * Stop audio recording
+     */
+    stopRecording() {
+        this.isRecording = false;
+
+        // Clear the recording interval
+        if (this.recordingInterval) {
+            clearTimeout(this.recordingInterval);
+            this.recordingInterval = null;
+        }
+
+        // Stop the media recorder
+        if (this.mediaRecorder && this.mediaRecorder.state === 'recording') {
+            this.mediaRecorder.stop();
+        }
+
+        // Stop all tracks
+        if (this.audioStream) {
+            this.audioStream.getTracks().forEach(track => track.stop());
+            this.audioStream = null;
+        }
+
+        this.elements.btnStart.disabled = false;
+        this.elements.btnStop.disabled = true;
+        this.setStatus('connected', 'Connected');
+
+        // Auto-save if enabled and we have content
+        if (this.elements.autoSaveToggle.checked && this.sessionTranscript.length > 0) {
+            this.saveRecording();
+        }
+    },
+
+    /**
+     * Send complete audio blob to server
+     */
+    sendAudioBlob(blob) {
+        const reader = new FileReader();
+        reader.onloadend = () => {
+            // Get base64 data without the data URL prefix
+            const base64 = reader.result.split(',')[1];
+
+            this.socket.emit('audio_data', {
+                audio: base64,
+                format: 'webm'
+            });
+        };
+        reader.readAsDataURL(blob);
+    },
+
+    /**
+     * Add words to the continuous caption stream
+     */
+    addWords(text) {
+        if (!text.trim()) return;
+
+        // Split incoming text into words
+        const newWords = text.trim().split(/\s+/);
+
+        // Add to pending queue for animated display
+        this.pendingWords.push(...newWords);
+
+        // Accumulate to session transcript for auto-save
+        if (this.isRecording) {
+            this.sessionTranscript.push(...newWords);
+        }
+
+        // Start animation if not already running
+        if (!this.wordAnimationTimer) {
+            this.animateNextWord();
+        }
+    },
+
+    /**
+     * Animate words appearing one by one
+     */
+    animateNextWord() {
+        if (this.pendingWords.length === 0) {
+            this.wordAnimationTimer = null;
+            return;
+        }
+
+        // Get next word from queue
+        const word = this.pendingWords.shift();
+        this.wordBuffer.push(word);
+
+        // Get max words from settings
+        const maxWords = Settings.current.max_words || 30;
+
+        // Trim buffer to max words
+        while (this.wordBuffer.length > maxWords) {
+            this.wordBuffer.shift();
+        }
+
+        // Update display
+        this.updateCaptionDisplay();
+
+        // Calculate delay based on pending words
+        // Faster if more words pending, slower if caught up
+        const baseDelay = 80; // ms per word
+        const minDelay = 30;
+        const delay = this.pendingWords.length > 10 ? minDelay : baseDelay;
+
+        // Schedule next word
+        this.wordAnimationTimer = setTimeout(() => {
+            this.animateNextWord();
+        }, delay);
+    },
+
+    /**
+     * Update the caption display with current word buffer
+     */
+    updateCaptionDisplay() {
+        const text = this.wordBuffer.join(' ');
+        this.elements.captions.textContent = text;
+    },
+
+    /**
+     * Clear all captions
+     */
+    clearCaptions() {
+        // Clear animation timer
+        if (this.wordAnimationTimer) {
+            clearTimeout(this.wordAnimationTimer);
+            this.wordAnimationTimer = null;
+        }
+        this.wordBuffer = [];
+        this.pendingWords = [];
+        this.elements.captions.textContent = '';
+    },
+
+    /**
+     * Save the current recording session
+     */
+    saveRecording() {
+        if (!this.sessionStartTime) return;
+
+        const endTime = new Date();
+        const transcript = this.sessionTranscript.join(' ');
+
+        this.socket.emit('save_recording', {
+            startTime: this.sessionStartTime.toISOString(),
+            endTime: endTime.toISOString(),
+            transcript: transcript,
+            wordCount: this.sessionTranscript.length
+        });
+
+        // Reset session state
+        this.sessionStartTime = null;
+        this.sessionTranscript = [];
+    }
+};
+
+// Initialize when DOM is ready
+document.addEventListener('DOMContentLoaded', () => {
+    App.init();
+});
--- a/static/js/recordings.js
+++ b/static/js/recordings.js
@ -0,0 +1,204 @@
+/**
+ * Live Captions - Recordings Panel
+ * Handles viewing and managing saved recordings
+ */
+
+const Recordings = {
+    // Current state
+    recordings: [],
+    currentRecording: null,
+
+    // DOM elements
+    elements: {},
+
+    /**
+     * Initialize the recordings panel
+     */
+    init() {
+        this.cacheElements();
+        this.bindEvents();
+    },
+
+    /**
+     * Cache DOM element references
+     */
+    cacheElements() {
+        this.elements = {
+            btnRecordings: document.getElementById('btn-recordings'),
+            btnClose: document.getElementById('btn-close-recordings'),
+            btnBackToList: document.getElementById('btn-back-to-list'),
+            btnDelete: document.getElementById('btn-delete-recording'),
+            panel: document.getElementById('recordings-panel'),
+            overlay: document.getElementById('overlay'),
+            recordingsList: document.getElementById('recordings-list'),
+            recordingViewer: document.getElementById('recording-viewer'),
+            viewerFilename: document.getElementById('viewer-filename'),
+            viewerContent: document.getElementById('viewer-content'),
+        };
+    },
+
+    /**
+     * Bind event listeners
+     */
+    bindEvents() {
+        this.elements.btnRecordings.addEventListener('click', () => this.openPanel());
+        this.elements.btnClose.addEventListener('click', () => this.closePanel());
+        this.elements.btnBackToList.addEventListener('click', () => this.showList());
+        this.elements.btnDelete.addEventListener('click', () => this.deleteCurrentRecording());
+
+        // Close on overlay click (but check if it's not settings panel)
+        this.elements.overlay.addEventListener('click', () => {
+            if (!this.elements.panel.classList.contains('hidden')) {
+                this.closePanel();
+            }
+        });
+    },
+
+    /**
+     * Open the recordings panel
+     */
+    openPanel() {
+        this.elements.panel.classList.remove('hidden');
+        this.elements.overlay.classList.remove('hidden');
+        this.showList();
+        this.loadRecordings();
+    },
+
+    /**
+     * Close the recordings panel
+     */
+    closePanel() {
+        this.elements.panel.classList.add('hidden');
+        this.elements.overlay.classList.add('hidden');
+        this.currentRecording = null;
+    },
+
+    /**
+     * Show the recordings list view
+     */
+    showList() {
+        this.elements.recordingsList.classList.remove('hidden');
+        this.elements.recordingViewer.classList.add('hidden');
+    },
+
+    /**
+     * Show the recording viewer
+     */
+    showViewer() {
+        this.elements.recordingsList.classList.add('hidden');
+        this.elements.recordingViewer.classList.remove('hidden');
+    },
+
+    /**
+     * Load recordings from the API
+     */
+    async loadRecordings() {
+        this.elements.recordingsList.innerHTML = '<p class="recordings-empty">Loading recordings...</p>';
+
+        try {
+            const response = await fetch('/api/recordings');
+            if (!response.ok) throw new Error('Failed to load recordings');
+
+            this.recordings = await response.json();
+            this.renderRecordingsList();
+        } catch (error) {
+            console.error('Error loading recordings:', error);
+            this.elements.recordingsList.innerHTML =
+                '<p class="recordings-empty">Failed to load recordings</p>';
+        }
+    },
+
+    /**
+     * Render the recordings list
+     */
+    renderRecordingsList() {
+        if (this.recordings.length === 0) {
+            this.elements.recordingsList.innerHTML =
+                '<p class="recordings-empty">No recordings yet.<br>Enable auto-save and record some captions!</p>';
+            return;
+        }
+
+        const html = this.recordings.map(recording => `
+            <div class="recording-item" data-filename="${recording.filename}">
+                <div class="recording-info">
+                    <span class="recording-date">${recording.date}</span>
+                    <span class="recording-meta">${this.formatFileSize(recording.size)}</span>
+                </div>
+                <span class="recording-arrow">&rsaquo;</span>
+            </div>
+        `).join('');
+
+        this.elements.recordingsList.innerHTML = html;
+
+        // Bind click events to items
+        this.elements.recordingsList.querySelectorAll('.recording-item').forEach(item => {
+            item.addEventListener('click', () => {
+                const filename = item.dataset.filename;
+                this.viewRecording(filename);
+            });
+        });
+    },
+
+    /**
+     * Format file size in human-readable format
+     */
+    formatFileSize(bytes) {
+        if (bytes < 1024) return bytes + ' B';
+        if (bytes < 1024 * 1024) return (bytes / 1024).toFixed(1) + ' KB';
+        return (bytes / (1024 * 1024)).toFixed(1) + ' MB';
+    },
+
+    /**
+     * View a specific recording
+     */
+    async viewRecording(filename) {
+        try {
+            const response = await fetch(`/api/recordings/${encodeURIComponent(filename)}`);
+            if (!response.ok) throw new Error('Failed to load recording');
+
+            const data = await response.json();
+            this.currentRecording = filename;
+            this.elements.viewerFilename.textContent = filename;
+            this.elements.viewerContent.textContent = data.content;
+            this.showViewer();
+        } catch (error) {
+            console.error('Error loading recording:', error);
+            alert('Failed to load recording');
+        }
+    },
+
+    /**
+     * Delete the currently viewed recording
+     */
+    async deleteCurrentRecording() {
+        if (!this.currentRecording) return;
+
+        if (!confirm('Are you sure you want to delete this recording?')) {
+            return;
+        }
+
+        try {
+            const response = await fetch(`/api/recordings/${encodeURIComponent(this.currentRecording)}`, {
+                method: 'DELETE'
+            });
+
+            if (!response.ok) throw new Error('Failed to delete recording');
+
+            // Remove from local list
+            this.recordings = this.recordings.filter(r => r.filename !== this.currentRecording);
+            this.currentRecording = null;
+
+            // Go back to list
+            this.showList();
+            this.renderRecordingsList();
+        } catch (error) {
+            console.error('Error deleting recording:', error);
+            alert('Failed to delete recording');
+        }
+    }
+};
+
+// Initialize when DOM is ready
+document.addEventListener('DOMContentLoaded', () => {
+    Recordings.init();
+});
--- a/static/js/settings.js
+++ b/static/js/settings.js
@ -0,0 +1,259 @@
+/**
+ * Settings Panel Module
+ * Handles user settings UI and persistence
+ */
+
+const Settings = {
+    // Current settings state
+    current: {},
+
+    // DOM elements
+    elements: {},
+
+    /**
+     * Initialize the settings module
+     */
+    init() {
+        this.cacheElements();
+        this.bindEvents();
+    },
+
+    /**
+     * Cache DOM element references
+     */
+    cacheElements() {
+        this.elements = {
+            panel: document.getElementById('settings-panel'),
+            overlay: document.getElementById('overlay'),
+            btnSettings: document.getElementById('btn-settings'),
+            btnClose: document.getElementById('btn-close-settings'),
+            btnSave: document.getElementById('btn-save-settings'),
+            btnReset: document.getElementById('btn-reset-settings'),
+
+            // Text settings
+            fontFamily: document.getElementById('font-family'),
+            fontSize: document.getElementById('font-size'),
+            fontSizeValue: document.getElementById('font-size-value'),
+            fontWeight: document.getElementById('font-weight'),
+            textColor: document.getElementById('text-color'),
+            textAlign: document.getElementById('text-align'),
+
+            // Background settings
+            backgroundColor: document.getElementById('background-color'),
+            backgroundOpacity: document.getElementById('background-opacity'),
+            opacityValue: document.getElementById('opacity-value'),
+            borderRadius: document.getElementById('border-radius'),
+            radiusValue: document.getElementById('radius-value'),
+            padding: document.getElementById('padding'),
+            paddingValue: document.getElementById('padding-value'),
+
+            // Behavior settings
+            maxWords: document.getElementById('max-words'),
+            maxWordsValue: document.getElementById('max-words-value'),
+
+            // Caption display
+            captionContainer: document.getElementById('caption-container'),
+        };
+    },
+
+    /**
+     * Bind event listeners
+     */
+    bindEvents() {
+        // Panel open/close
+        this.elements.btnSettings.addEventListener('click', () => this.openPanel());
+        this.elements.btnClose.addEventListener('click', () => this.closePanel());
+        this.elements.overlay.addEventListener('click', () => this.closePanel());
+
+        // Save/Reset
+        this.elements.btnSave.addEventListener('click', () => this.saveSettings());
+        this.elements.btnReset.addEventListener('click', () => this.resetSettings());
+
+        // Live preview on input change
+        const inputs = [
+            'fontFamily', 'fontSize', 'fontWeight', 'textColor', 'textAlign',
+            'backgroundColor', 'backgroundOpacity', 'borderRadius', 'padding',
+            'maxWords'
+        ];
+
+        inputs.forEach(name => {
+            const element = this.elements[name];
+            if (element) {
+                element.addEventListener('input', () => this.updatePreview());
+            }
+        });
+
+        // Update value displays for range inputs
+        this.elements.fontSize.addEventListener('input', (e) => {
+            this.elements.fontSizeValue.textContent = e.target.value;
+        });
+        this.elements.backgroundOpacity.addEventListener('input', (e) => {
+            this.elements.opacityValue.textContent = e.target.value;
+        });
+        this.elements.borderRadius.addEventListener('input', (e) => {
+            this.elements.radiusValue.textContent = e.target.value;
+        });
+        this.elements.padding.addEventListener('input', (e) => {
+            this.elements.paddingValue.textContent = e.target.value;
+        });
+        this.elements.maxWords.addEventListener('input', (e) => {
+            this.elements.maxWordsValue.textContent = e.target.value;
+        });
+    },
+
+    /**
+     * Open settings panel
+     */
+    openPanel() {
+        this.elements.panel.classList.remove('hidden');
+        this.elements.overlay.classList.remove('hidden');
+    },
+
+    /**
+     * Close settings panel
+     */
+    closePanel() {
+        this.elements.panel.classList.add('hidden');
+        this.elements.overlay.classList.add('hidden');
+    },
+
+    /**
+     * Apply settings to the UI
+     */
+    applySettings(settings) {
+        this.current = settings;
+
+        // Update form values
+        this.elements.fontFamily.value = settings.font_family;
+        this.elements.fontSize.value = settings.font_size;
+        this.elements.fontSizeValue.textContent = settings.font_size;
+        this.elements.fontWeight.value = settings.font_weight;
+        this.elements.textColor.value = settings.text_color;
+        this.elements.textAlign.value = settings.text_align;
+
+        this.elements.backgroundColor.value = settings.background_color;
+        this.elements.backgroundOpacity.value = Math.round(settings.background_opacity * 100);
+        this.elements.opacityValue.textContent = Math.round(settings.background_opacity * 100);
+        this.elements.borderRadius.value = settings.border_radius;
+        this.elements.radiusValue.textContent = settings.border_radius;
+        this.elements.padding.value = settings.padding;
+        this.elements.paddingValue.textContent = settings.padding;
+
+        this.elements.maxWords.value = settings.max_words || 30;
+        this.elements.maxWordsValue.textContent = settings.max_words || 30;
+
+        // Apply to caption container
+        this.updatePreview();
+    },
+
+    /**
+     * Update live preview of caption styling
+     */
+    updatePreview() {
+        const container = this.elements.captionContainer;
+        const opacity = this.elements.backgroundOpacity.value / 100;
+
+        // Parse background color and apply opacity
+        const bgColor = this.elements.backgroundColor.value;
+        const r = parseInt(bgColor.slice(1, 3), 16);
+        const g = parseInt(bgColor.slice(3, 5), 16);
+        const b = parseInt(bgColor.slice(5, 7), 16);
+
+        container.style.fontFamily = this.elements.fontFamily.value;
+        container.style.fontSize = `${this.elements.fontSize.value}px`;
+        container.style.fontWeight = this.elements.fontWeight.value;
+        container.style.color = this.elements.textColor.value;
+        container.style.textAlign = this.elements.textAlign.value;
+        container.style.backgroundColor = `rgba(${r}, ${g}, ${b}, ${opacity})`;
+        container.style.borderRadius = `${this.elements.borderRadius.value}px`;
+        container.style.padding = `${this.elements.padding.value}px`;
+
+        // Store max words for caption management
+        this.current.max_words = parseInt(this.elements.maxWords.value);
+    },
+
+    /**
+     * Get current form values as settings object
+     */
+    getFormValues() {
+        return {
+            font_family: this.elements.fontFamily.value,
+            font_size: parseInt(this.elements.fontSize.value),
+            font_weight: this.elements.fontWeight.value,
+            text_color: this.elements.textColor.value,
+            text_align: this.elements.textAlign.value,
+            background_color: this.elements.backgroundColor.value,
+            background_opacity: this.elements.backgroundOpacity.value / 100,
+            border_radius: parseInt(this.elements.borderRadius.value),
+            padding: parseInt(this.elements.padding.value),
+            max_words: parseInt(this.elements.maxWords.value),
+        };
+    },
+
+    /**
+     * Save settings to server
+     */
+    async saveSettings() {
+        const settings = this.getFormValues();
+
+        try {
+            const response = await fetch('/api/settings', {
+                method: 'PUT',
+                headers: {
+                    'Content-Type': 'application/json',
+                },
+                body: JSON.stringify(settings),
+            });
+
+            if (response.ok) {
+                this.current = await response.json();
+                this.closePanel();
+                console.log('Settings saved');
+            } else {
+                console.error('Failed to save settings');
+            }
+        } catch (error) {
+            console.error('Error saving settings:', error);
+        }
+    },
+
+    /**
+     * Reset settings to defaults
+     */
+    async resetSettings() {
+        if (!confirm('Reset all settings to defaults?')) {
+            return;
+        }
+
+        try {
+            const response = await fetch('/api/settings/reset', {
+                method: 'POST',
+            });
+
+            if (response.ok) {
+                const settings = await response.json();
+                this.applySettings(settings);
+                console.log('Settings reset to defaults');
+            } else {
+                console.error('Failed to reset settings');
+            }
+        } catch (error) {
+            console.error('Error resetting settings:', error);
+        }
+    },
+
+    /**
+     * Fetch settings from server
+     */
+    async fetchSettings() {
+        try {
+            const response = await fetch('/api/settings');
+            if (response.ok) {
+                const settings = await response.json();
+                this.applySettings(settings);
+            }
+        } catch (error) {
+            console.error('Error fetching settings:', error);
+        }
+    }
+};
--- a/templates/index.html
+++ b/templates/index.html
@ -0,0 +1,159 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Live Captions</title>
+    <link rel="stylesheet" href="/static/css/style.css">
+</head>
+<body>
+    <div id="app">
+        <!-- Caption Display Area -->
+        <div id="caption-container">
+            <div id="captions"></div>
+        </div>
+
+        <!-- Controls Bar -->
+        <div id="controls">
+            <button id="btn-start" class="btn btn-primary">
+                <span class="icon">&#9658;</span> Start
+            </button>
+            <button id="btn-stop" class="btn btn-danger" disabled>
+                <span class="icon">&#9632;</span> Stop
+            </button>
+            <button id="btn-clear" class="btn btn-secondary">
+                Clear
+            </button>
+            <label class="toggle-switch" title="Auto-save recordings">
+                <input type="checkbox" id="auto-save-toggle">
+                <span class="toggle-slider"></span>
+                <span class="toggle-label">Auto-save</span>
+            </label>
+            <button id="btn-recordings" class="btn btn-icon" title="Recordings">
+                &#128203;
+            </button>
+            <button id="btn-settings" class="btn btn-icon" title="Settings">
+                &#9881;
+            </button>
+        </div>
+
+        <!-- Status Indicator -->
+        <div id="status">
+            <span id="status-dot" class="dot"></span>
+            <span id="status-text">Ready</span>
+        </div>
+
+        <!-- Settings Panel -->
+        <div id="settings-panel" class="panel hidden">
+            <div class="panel-header">
+                <h2>Settings</h2>
+                <button id="btn-close-settings" class="btn-close">&times;</button>
+            </div>
+            <div class="panel-content">
+                <!-- Font Settings -->
+                <div class="setting-group">
+                    <h3>Text</h3>
+
+                    <label for="font-family">Font Family</label>
+                    <select id="font-family">
+                        <option value="Arial, sans-serif">Arial</option>
+                        <option value="'Helvetica Neue', Helvetica, sans-serif">Helvetica</option>
+                        <option value="'Segoe UI', sans-serif">Segoe UI</option>
+                        <option value="'Roboto', sans-serif">Roboto</option>
+                        <option value="'Open Sans', sans-serif">Open Sans</option>
+                        <option value="Georgia, serif">Georgia</option>
+                        <option value="'Times New Roman', serif">Times New Roman</option>
+                        <option value="'Courier New', monospace">Courier New</option>
+                        <option value="monospace">Monospace</option>
+                    </select>
+
+                    <label for="font-size">Font Size: <span id="font-size-value">32</span>px</label>
+                    <input type="range" id="font-size" min="16" max="72" value="32">
+
+                    <label for="font-weight">Font Weight</label>
+                    <select id="font-weight">
+                        <option value="normal">Normal</option>
+                        <option value="bold">Bold</option>
+                        <option value="lighter">Light</option>
+                    </select>
+
+                    <label for="text-color">Text Color</label>
+                    <input type="color" id="text-color" value="#ffffff">
+
+                    <label for="text-align">Text Alignment</label>
+                    <select id="text-align">
+                        <option value="left">Left</option>
+                        <option value="center">Center</option>
+                        <option value="right">Right</option>
+                    </select>
+                </div>
+
+                <!-- Background Settings -->
+                <div class="setting-group">
+                    <h3>Background</h3>
+
+                    <label for="background-color">Background Color</label>
+                    <input type="color" id="background-color" value="#1a1a2e">
+
+                    <label for="background-opacity">Opacity: <span id="opacity-value">90</span>%</label>
+                    <input type="range" id="background-opacity" min="0" max="100" value="90">
+
+                    <label for="border-radius">Corner Radius: <span id="radius-value">10</span>px</label>
+                    <input type="range" id="border-radius" min="0" max="30" value="10">
+
+                    <label for="padding">Padding: <span id="padding-value">20</span>px</label>
+                    <input type="range" id="padding" min="5" max="50" value="20">
+                </div>
+
+                <!-- Caption Behavior -->
+                <div class="setting-group">
+                    <h3>Behavior</h3>
+
+                    <label for="max-words">Max Words: <span id="max-words-value">30</span></label>
+                    <input type="range" id="max-words" min="1" max="100" value="30">
+                </div>
+
+                <!-- Actions -->
+                <div class="setting-actions">
+                    <button id="btn-save-settings" class="btn btn-primary">Save Settings</button>
+                    <button id="btn-reset-settings" class="btn btn-secondary">Reset to Defaults</button>
+                </div>
+            </div>
+        </div>
+
+        <!-- Recordings Panel -->
+        <div id="recordings-panel" class="panel hidden">
+            <div class="panel-header">
+                <h2>Recordings</h2>
+                <button id="btn-close-recordings" class="btn-close">&times;</button>
+            </div>
+            <div class="panel-content">
+                <!-- Recordings List -->
+                <div id="recordings-list" class="recordings-list">
+                    <p class="recordings-empty">Loading recordings...</p>
+                </div>
+
+                <!-- Recording Viewer -->
+                <div id="recording-viewer" class="recording-viewer hidden">
+                    <div class="viewer-header">
+                        <button id="btn-back-to-list" class="btn btn-secondary btn-small">&larr; Back</button>
+                        <span id="viewer-filename" class="viewer-filename"></span>
+                    </div>
+                    <div id="viewer-content" class="viewer-content"></div>
+                    <div class="viewer-actions">
+                        <button id="btn-delete-recording" class="btn btn-danger btn-small">Delete</button>
+                    </div>
+                </div>
+            </div>
+        </div>
+
+        <!-- Overlay for panels -->
+        <div id="overlay" class="hidden"></div>
+    </div>
+
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/4.7.2/socket.io.min.js"></script>
+    <script src="/static/js/settings.js"></script>
+    <script src="/static/js/recordings.js"></script>
+    <script src="/static/js/app.js"></script>
+</body>
+</html>
--- a/transcriber.py
+++ b/transcriber.py
@ -0,0 +1,102 @@
+"""
+Whisper transcription module using faster-whisper.
+"""
+
+import os
+import io
+import tempfile
+import logging
+from faster_whisper import WhisperModel
+from pydub import AudioSegment
+
+logger = logging.getLogger(__name__)
+
+# Global model instance (loaded once)
+_model = None
+
+
+def get_model():
+    """Get or initialize the Whisper model."""
+    global _model
+
+    if _model is None:
+        model_size = os.environ.get('WHISPER_MODEL', 'base')
+        device = os.environ.get('WHISPER_DEVICE', 'cpu')
+        compute_type = os.environ.get('WHISPER_COMPUTE_TYPE', 'int8')
+
+        logger.info(f"Loading Whisper model: {model_size} on {device} ({compute_type})")
+
+        _model = WhisperModel(
+            model_size,
+            device=device,
+            compute_type=compute_type
+        )
+
+        logger.info("Whisper model loaded successfully")
+
+    return _model
+
+
+def transcribe_audio(audio_bytes, format='webm'):
+    """
+    Transcribe audio bytes to text.
+
+    Args:
+        audio_bytes: Raw audio data
+        format: Audio format (default: webm)
+
+    Returns:
+        Transcribed text string
+    """
+    if not audio_bytes:
+        return ""
+
+    try:
+        # Convert audio to WAV format that Whisper expects
+        audio = AudioSegment.from_file(
+            io.BytesIO(audio_bytes),
+            format=format
+        )
+
+        # Convert to 16kHz mono WAV (Whisper's expected format)
+        audio = audio.set_frame_rate(16000).set_channels(1)
+
+        # Export to temporary file (faster-whisper needs a file path)
+        with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as tmp:
+            audio.export(tmp.name, format='wav')
+            tmp_path = tmp.name
+
+        try:
+            # Transcribe
+            model = get_model()
+            segments, info = model.transcribe(
+                tmp_path,
+                beam_size=5,
+                vad_filter=True,
+                vad_parameters=dict(
+                    min_silence_duration_ms=500
+                )
+            )
+
+            # Combine all segments into text
+            text = ' '.join(segment.text.strip() for segment in segments)
+            return text.strip()
+
+        finally:
+            # Clean up temp file
+            if os.path.exists(tmp_path):
+                os.unlink(tmp_path)
+
+    except Exception as e:
+        logger.error(f"Transcription error: {e}")
+        return ""
+
+
+def preload_model():
+    """Preload the model during startup."""
+    try:
+        get_model()
+        return True
+    except Exception as e:
+        logger.error(f"Failed to preload model: {e}")
+        return False