37 KiB

Media Job Queue System

Overview

The Media Job Queue System provides asynchronous background processing for CPU and GPU-intensive video operations. Built on a custom job queue with resource-aware scheduling, it handles everything from directory scanning to AI-powered video analysis while maintaining system stability through resource category management.

Key Features:

  • Resource Categories — Jobs classified by resource needs (CPU, GPU encode, GPU AI)
  • Priority Scheduling — High-priority jobs processed first within same category
  • Job Types — 15+ job types (compilation, encoding, digest generation, scene extraction, etc.)
  • Progress Tracking — Real-time progress updates (0-100%)
  • Status Management — Pending → Queued → Running → Completed/Failed lifecycle
  • Retry Logic — Failed jobs can be retried with exponential backoff
  • Detailed Logging — Execution logs for debugging and audit trail
  • Queue Management — Pause, resume, cancel, and prioritize jobs
  • VRAM Awareness — Prevents GPU memory exhaustion by tracking VRAM requirements

Access Control:

  • Job viewing/management requires SUPER_ADMIN role
  • Job creation can be triggered by admins or automated workflows

Technology Stack:

  • Database Queue — PostgreSQL-backed job queue (no BullMQ for media)
  • Worker Process — Node.js worker polling queue every 5 seconds
  • FFmpeg — Video encoding and compilation
  • AI Integration — Future support for scene detection and auto-tagging

Architecture

flowchart TB
    subgraph "Job Creation"
        A1[Admin Action]
        A2[Automated Trigger]
        A3[Scheduled Task]
    end

    subgraph "Job Queue (PostgreSQL)"
        Q1[Pending Jobs]
        Q2[Queued Jobs]
        Q3[Running Jobs]
        Q4[Completed/Failed Jobs]
    end

    subgraph "Worker Process"
        W1[Job Poller<br/>Every 5s]
        W2[Resource Checker]
        W3[Job Executor]
        W4[Progress Updater]
    end

    subgraph "Processors"
        P1[CPU Jobs<br/>scan, validate]
        P2[GPU Encode<br/>reencode, compile]
        P3[GPU AI<br/>digest, tag, scene]
    end

    subgraph "Results"
        R1[Video Records Updated]
        R2[New Files Created]
        R3[Logs Written]
    end

    A1 --> Q1
    A2 --> Q1
    A3 --> Q1

    Q1 --> W1
    W1 --> W2
    W2 -->|Check Resources| Q2
    Q2 --> W3

    W3 --> P1
    W3 --> P2
    W3 --> P3

    W3 --> W4
    W4 --> Q3

    P1 --> R1
    P2 --> R2
    P3 --> R3

    Q3 --> Q4

    style Q1 fill:#f9f
    style Q3 fill:#ff9
    style Q4 fill:#9f9

Workflow:

  1. Job Creation — Admin clicks "Re-encode" button, API creates job record
  2. Queue Polling — Worker checks for pending jobs every 5 seconds
  3. Resource Check — Worker verifies sufficient VRAM/CPU available
  4. Job Execution — Worker runs appropriate processor (FFmpeg, AI script, etc.)
  5. Progress Updates — Worker updates job progress every ~5% completion
  6. Completion — Worker marks job complete and logs results
  7. Retry on Failure — Failed jobs can be retried with exponential backoff

Database Model

Jobs Table Schema

// api/src/modules/media/db/schema.ts
export const jobs = pgTable('jobs', {
  id: uuid('id').primaryKey().defaultRandom(),

  // Job Definition
  type: text('type').notNull(), // JobType enum: compilation, scan, reencode, etc.
  status: text('status').notNull().default('pending'), // JobStatus enum
  params: jsonb('params').$type<Record<string, any>>().notNull(), // Job-specific parameters

  // Progress Tracking
  progress: integer('progress').default(0), // 0-100
  log: text('log').default(''), // Execution log (append-only)

  // Scheduling
  priority: integer('priority').default(5), // 1 (highest) - 10 (lowest)
  queuePosition: integer('queue_position'), // Position in queue
  waitingReason: text('waiting_reason'), // Why job is waiting (e.g., "Insufficient VRAM")

  // Resource Management
  resourceCategory: text('resource_category').notNull(), // cpu|gpu_encode|gpu_ai
  vramRequired: integer('vram_required').default(0), // MB of VRAM needed

  // Timing
  createdAt: timestamp('created_at').defaultNow(),
  startedAt: timestamp('started_at'),
  completedAt: timestamp('completed_at'),

  // Retry Logic
  retryCount: integer('retry_count').default(0),
  maxRetries: integer('max_retries').default(3),
  retryAfter: timestamp('retry_after'), // Don't retry before this time
});

Job Types Enum

Type Resource Category VRAM (MB) Description
scan cpu 0 Scan directory for new videos
public_scan cpu 0 Scan public gallery directory
validate cpu 0 Validate video metadata (FFprobe)
reencode_streaming gpu_encode 4000 Re-encode for web playback (H.264)
compile_random gpu_encode 2000 Random video compilation
compile_quad gpu_encode 4000 4-up grid compilation
compile_mega gpu_encode 6000 Large multi-video compilation
compile_gif cpu 0 Create GIF from video
digest_generate gpu_ai 8000 AI-powered video digest
clip_generate gpu_ai 6000 Extract clips from digest
highlight_generate gpu_ai 8000 Create highlight reel
tag_generation gpu_ai 6000 AI auto-tagging
scene_extract gpu_ai 8000 Scene detection and extraction
thumbnail_generate cpu 0 Generate thumbnail from video
move_to_library cpu 0 Move video from inbox to target directory

Job Status Enum

Status Description Final State
pending Waiting to be picked up by worker No
queued Selected by worker, waiting for resources No
running Currently executing No
completed Finished successfully Yes
failed Execution failed (see log for details) Yes
cancelled Manually cancelled by admin Yes
paused Temporarily paused (can be resumed) No

Resource Categories

Category Typical VRAM Concurrent Limit Use Cases
cpu 0 MB 5 Scanning, validation, simple encodes, GIF creation
gpu_encode 2-6 GB 2 Video re-encoding, compilation, format conversion
gpu_ai 6-12 GB 1 AI tagging, scene detection, digest generation, highlight extraction

VRAM Management:

Worker tracks total VRAM usage across running jobs:

const runningJobs = await db.select().from(jobs).where(eq(jobs.status, 'running'));
const totalVramUsed = runningJobs.reduce((sum, job) => sum + (job.vramRequired || 0), 0);

// Only start new job if VRAM available
const TOTAL_VRAM = 16000; // 16GB GPU
if (totalVramUsed + newJob.vramRequired <= TOTAL_VRAM) {
  startJob(newJob);
}

API Endpoints

All endpoints require SUPER_ADMIN role.

List Jobs

GET /api/media/jobs

Query Parameters:

Parameter Type Default Description
page number 1 Page number
limit number 20 Results per page
status string - Filter by status (pending, running, completed, failed)
type string - Filter by job type
resourceCategory string - Filter by resource category

Response:

{
  "data": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "type": "reencode_streaming",
      "status": "running",
      "progress": 45,
      "resourceCategory": "gpu_encode",
      "vramRequired": 4000,
      "priority": 5,
      "params": {
        "videoId": "660e8400-e29b-41d4-a716-446655440001",
        "targetBitrate": 2000
      },
      "startedAt": "2026-02-13T10:30:00Z",
      "createdAt": "2026-02-13T10:25:00Z"
    }
  ],
  "pagination": {
    "page": 1,
    "limit": 20,
    "total": 156,
    "totalPages": 8
  }
}

Get Job Details

GET /api/media/jobs/:id

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "type": "reencode_streaming",
  "status": "completed",
  "progress": 100,
  "log": "Starting re-encode...\nFFmpeg command: ffmpeg -i input.mp4 -c:v h264 -preset medium -crf 23 output.mp4\nProgress: 25%\nProgress: 50%\nProgress: 75%\nProgress: 100%\nCompleted successfully",
  "params": {
    "videoId": "660e8400-e29b-41d4-a716-446655440001",
    "inputPath": "inbox/original.mp4",
    "outputPath": "playback/encoded.mp4",
    "targetBitrate": 2000
  },
  "resourceCategory": "gpu_encode",
  "vramRequired": 4000,
  "priority": 5,
  "retryCount": 0,
  "maxRetries": 3,
  "createdAt": "2026-02-13T10:25:00Z",
  "startedAt": "2026-02-13T10:30:00Z",
  "completedAt": "2026-02-13T10:45:00Z"
}

Create Job

POST /api/media/jobs

Request Body:

{
  "type": "reencode_streaming",
  "params": {
    "videoId": "660e8400-e29b-41d4-a716-446655440001",
    "targetBitrate": 2000
  },
  "priority": 5,
  "resourceCategory": "gpu_encode",
  "vramRequired": 4000
}

Response:

{
  "id": "770e8400-e29b-41d4-a716-446655440002",
  "type": "reencode_streaming",
  "status": "pending",
  "progress": 0,
  "createdAt": "2026-02-13T11:00:00Z"
}

Retry Failed Job

POST /api/media/jobs/:id/retry

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "retryCount": 1,
  "retryAfter": null,
  "log": "Starting re-encode...\n[Previous logs...]\n--- RETRY ATTEMPT 1 ---\n"
}

Retry Logic:

  • Failed jobs can be retried up to maxRetries times (default: 3)
  • Exponential backoff: wait 2^retryCount minutes before retry
  • Retry resets status to pending and appends retry marker to log

Cancel Job

POST /api/media/jobs/:id/cancel

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "cancelled",
  "log": "Starting re-encode...\nProgress: 25%\n--- JOB CANCELLED BY ADMIN ---"
}

Notes:

  • Running jobs cannot be cancelled immediately (worker must finish current chunk)
  • Pending/queued jobs cancelled instantly

Pause/Resume Job

POST /api/media/jobs/:id/pause
POST /api/media/jobs/:id/resume

Pause Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "paused"
}

Resume Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending"
}

Queue Statistics

GET /api/media/jobs/stats

Response:

{
  "pending": 12,
  "queued": 2,
  "running": 3,
  "completed": 1458,
  "failed": 23,
  "paused": 1,
  "totalVramUsed": 12000,
  "totalVramAvailable": 16000,
  "averageProcessingTime": 245,
  "jobsByType": {
    "reencode_streaming": 45,
    "scan": 8,
    "compile_random": 12
  }
}

Admin Workflow

Viewing Job Queue

  1. Navigate to Media → Jobs in admin sidebar
  2. Table displays all jobs with:
    • Job type icon
    • Status badge (color-coded)
    • Progress bar
    • Priority indicator
    • Resource category
    • Created/started/completed times
  3. Use filters at top:
    • Status dropdown (All / Pending / Running / Completed / Failed)
    • Type dropdown (job type)
    • Resource dropdown (CPU / GPU Encode / GPU AI)

Creating Jobs Manually

Option 1: From Library Page

  1. Select video in library table
  2. Click "Actions" dropdown
  3. Select action:
    • "Re-encode for Streaming"
    • "Generate Thumbnail"
    • "Validate Metadata"
    • "Move to Directory"
  4. Confirm job creation
  5. Redirected to Jobs page showing new job

Option 2: From Jobs Page

  1. Click "Create Job" button
  2. Modal opens with form:
    • Type dropdown (15+ job types)
    • Video selector (search by title/filename)
    • Priority slider (1-10)
    • Parameters JSON editor (advanced)
  3. Click "Create"
  4. Job appears in pending queue

Monitoring Job Progress

Real-Time Updates:

  1. Jobs page polls API every 2 seconds for running jobs
  2. Progress bars update smoothly (0-100%)
  3. Status badges change color:
    • Grey: Pending
    • Blue: Queued
    • Yellow: Running
    • Green: Completed
    • Red: Failed

Detailed Logs:

  1. Click job row to expand details panel
  2. View execution log in monospace text area
  3. Log updates in real-time while job running
  4. Example log output:
[2026-02-13 10:30:15] Starting re-encode job
[2026-02-13 10:30:16] Input: /media/local/inbox/original.mp4
[2026-02-13 10:30:16] Output: /media/local/playback/encoded.mp4
[2026-02-13 10:30:17] FFmpeg command: ffmpeg -i /media/local/inbox/original.mp4 -c:v libx264 -preset medium -crf 23 -c:a aac -b:a 128k /media/local/playback/encoded.mp4
[2026-02-13 10:30:20] Progress: 5%
[2026-02-13 10:30:25] Progress: 15%
[2026-02-13 10:30:30] Progress: 25%
...
[2026-02-13 10:45:00] Progress: 100%
[2026-02-13 10:45:01] Re-encode completed successfully
[2026-02-13 10:45:02] Output file size: 25.3 MB

Retrying Failed Jobs

  1. Filter for Failed jobs
  2. Click job row to view error log
  3. Identify failure reason (e.g., "FFmpeg error: codec not supported")
  4. Fix underlying issue (install codec, fix file path, etc.)
  5. Click "Retry" button
  6. Job resets to pending status
  7. Worker picks up job again

Auto-Retry:

Jobs automatically retry up to 3 times with exponential backoff:

  • 1st retry: after 2 minutes
  • 2nd retry: after 4 minutes
  • 3rd retry: after 8 minutes

Cancelling Jobs

  1. Find job in pending/queued/running state
  2. Click "Cancel" button
  3. Confirm cancellation dialog
  4. Job marked as cancelled
  5. If running, worker stops after current chunk completes

Pausing/Resuming Jobs

Use Case: Temporarily stop low-priority jobs to free resources for urgent tasks

  1. Select low-priority pending job
  2. Click "Pause" button
  3. Job status changes to paused (greyed out)
  4. Worker skips paused jobs
  5. When ready, click "Resume"
  6. Job returns to pending queue

Job Type Details

Scan Jobs (scan, public_scan)

Purpose: Scan filesystem directory for new videos and create database records

Parameters:

{
  "directoryType": "videos",
  "skipExisting": true
}

Process:

  1. Read directory /media/local/library/{directoryType}/
  2. Filter for video extensions (.mp4, .mov, etc.)
  3. Check each file against database (by path)
  4. Create records for new files
  5. Run FFprobe on new files
  6. Update progress: files processed / total files

Typical Duration: 2-30 seconds (depends on file count)


Validation Jobs (validate)

Purpose: Re-run FFprobe to refresh video metadata

Parameters:

{
  "videoId": "660e8400-e29b-41d4-a716-446655440001"
}

Process:

  1. Fetch video record from database
  2. Build full file path
  3. Run FFprobe extraction
  4. Update database with fresh metadata
  5. Mark video as valid/invalid based on result

Typical Duration: 100-500ms per video


Re-encode Jobs (reencode_streaming)

Purpose: Convert video to web-optimized format (H.264, web-friendly profile)

Parameters:

{
  "videoId": "660e8400-e29b-41d4-a716-446655440001",
  "targetBitrate": 2000,
  "preset": "medium",
  "crf": 23
}

FFmpeg Command:

ffmpeg -i /media/local/inbox/original.mp4 \
  -c:v libx264 \
  -preset medium \
  -crf 23 \
  -maxrate 2000k \
  -bufsize 4000k \
  -c:a aac \
  -b:a 128k \
  -movflags +faststart \
  /media/local/playback/encoded.mp4

Process:

  1. Validate input file exists
  2. Build FFmpeg command
  3. Start encoding process
  4. Parse FFmpeg progress output
  5. Update job progress every ~5%
  6. Create new video record for encoded file
  7. Update original video reencodeJobId reference

Typical Duration: 5-30 minutes (depends on video length and resolution)


Compilation Jobs (compile_random, compile_quad, compile_mega)

Purpose: Merge multiple videos into single compilation

Parameters (Random):

{
  "count": 10,
  "minDuration": 30,
  "maxDuration": 120,
  "orientation": "landscape",
  "outputPath": "compilations/random-001.mp4"
}

Process:

  1. Query database for videos matching criteria (orientation, duration range)
  2. Randomly select count videos
  3. Build FFmpeg concat demuxer file list
  4. Run FFmpeg compilation
  5. Create new video record for compilation
  6. Update progress based on FFmpeg output

Quad Compilation (4-up grid):

ffmpeg -i video1.mp4 -i video2.mp4 -i video3.mp4 -i video4.mp4 \
  -filter_complex "[0:v][1:v][2:v][3:v]xstack=inputs=4:layout=0_0|w0_0|0_h0|w0_h0[v]" \
  -map "[v]" \
  output.mp4

Typical Duration: 10-60 minutes


Digest Generation (digest_generate)

Purpose: AI-powered video digest creation (future feature)

Parameters:

{
  "videoId": "660e8400-e29b-41d4-a716-446655440001",
  "targetLength": 60,
  "includeHighlights": true
}

Process (Planned):

  1. Extract frames at 1 FPS
  2. Run AI scene detection
  3. Identify highlights (action, faces, motion)
  4. Select best segments totaling target length
  5. Compile segments into digest video

GPU AI Required: 8GB VRAM


Thumbnail Generation (thumbnail_generate)

Purpose: Extract thumbnail image from video

Parameters:

{
  "videoId": "660e8400-e29b-41d4-a716-446655440001",
  "timestamp": 5,
  "width": 640
}

FFmpeg Command:

ffmpeg -i /media/local/library/videos/sample.mp4 \
  -ss 00:00:05 \
  -vframes 1 \
  -vf scale=640:-1 \
  /media/local/thumbnails/sample.jpg

Process:

  1. Seek to timestamp (default: 25% into video)
  2. Extract single frame
  3. Scale to width (preserve aspect ratio)
  4. Save as JPEG
  5. Update video record with thumbnailPath

Typical Duration: 1-5 seconds


Code Examples

Create Re-encode Job

// api/src/modules/media/routes/jobs.routes.ts
import { db } from '@/modules/media/db';
import { jobs, videos } from '@/modules/media/db/schema';

app.post('/api/media/jobs/reencode', async (req, reply) => {
  const { videoId, targetBitrate = 2000, preset = 'medium', crf = 23 } = req.body;

  // Fetch video
  const [video] = await db
    .select()
    .from(videos)
    .where(eq(videos.id, videoId))
    .limit(1);

  if (!video) {
    return reply.code(404).send({ error: 'Video not found' });
  }

  // Create job
  const [job] = await db
    .insert(jobs)
    .values({
      type: 'reencode_streaming',
      status: 'pending',
      params: {
        videoId,
        inputPath: video.path,
        outputPath: `playback/${video.filename}`,
        targetBitrate,
        preset,
        crf,
      },
      resourceCategory: 'gpu_encode',
      vramRequired: 4000,
      priority: 5,
    })
    .returning();

  reply.send(job);
});

Job Worker (Polling Loop)

// api/src/modules/media/services/job-worker.service.ts
import { db } from '@/modules/media/db';
import { jobs } from '@/modules/media/db/schema';
import { eq, and, lte } from 'drizzle-orm';

export class JobWorkerService {
  private polling = false;

  async start() {
    this.polling = true;
    console.log('Job worker started');

    while (this.polling) {
      try {
        await this.processNextJob();
      } catch (error) {
        console.error('Job worker error:', error);
      }

      // Wait 5 seconds before next poll
      await new Promise((resolve) => setTimeout(resolve, 5000));
    }
  }

  async stop() {
    this.polling = false;
    console.log('Job worker stopped');
  }

  private async processNextJob() {
    // Find next pending job (highest priority first)
    const [job] = await db
      .select()
      .from(jobs)
      .where(eq(jobs.status, 'pending'))
      .orderBy(jobs.priority, jobs.createdAt)
      .limit(1);

    if (!job) {
      return; // No jobs in queue
    }

    // Check resource availability
    const canRun = await this.checkResources(job);
    if (!canRun) {
      // Update waiting reason
      await db
        .update(jobs)
        .set({ waitingReason: 'Insufficient resources' })
        .where(eq(jobs.id, job.id));
      return;
    }

    // Start job
    await this.executeJob(job);
  }

  private async checkResources(job: any): Promise<boolean> {
    // Get running jobs
    const runningJobs = await db
      .select()
      .from(jobs)
      .where(eq(jobs.status, 'running'));

    // Calculate total VRAM used
    const totalVramUsed = runningJobs.reduce(
      (sum, j) => sum + (j.vramRequired || 0),
      0
    );

    const TOTAL_VRAM = 16000; // 16GB GPU
    const available = TOTAL_VRAM - totalVramUsed;

    if (job.vramRequired && job.vramRequired > available) {
      return false; // Not enough VRAM
    }

    // Check concurrent job limits by category
    const categoryCount = runningJobs.filter(
      (j) => j.resourceCategory === job.resourceCategory
    ).length;

    const limits = {
      cpu: 5,
      gpu_encode: 2,
      gpu_ai: 1,
    };

    if (categoryCount >= limits[job.resourceCategory as keyof typeof limits]) {
      return false; // Category limit reached
    }

    return true; // Resources available
  }

  private async executeJob(job: any) {
    // Mark as running
    await db
      .update(jobs)
      .set({
        status: 'running',
        startedAt: new Date(),
        waitingReason: null,
      })
      .where(eq(jobs.id, job.id));

    try {
      // Execute job based on type
      switch (job.type) {
        case 'reencode_streaming':
          await this.executeReencode(job);
          break;
        case 'scan':
          await this.executeScan(job);
          break;
        case 'thumbnail_generate':
          await this.executeThumbnail(job);
          break;
        // ... other job types
      }

      // Mark as completed
      await db
        .update(jobs)
        .set({
          status: 'completed',
          progress: 100,
          completedAt: new Date(),
        })
        .where(eq(jobs.id, job.id));
    } catch (error: any) {
      // Mark as failed
      await db
        .update(jobs)
        .set({
          status: 'failed',
          log: (job.log || '') + `\n\n--- ERROR ---\n${error.message}`,
        })
        .where(eq(jobs.id, job.id));

      // Schedule retry if under max retries
      if (job.retryCount < job.maxRetries) {
        const retryDelay = Math.pow(2, job.retryCount) * 60 * 1000; // Exponential backoff
        await db
          .update(jobs)
          .set({
            status: 'pending',
            retryCount: job.retryCount + 1,
            retryAfter: new Date(Date.now() + retryDelay),
          })
          .where(eq(jobs.id, job.id));
      }
    }
  }

  private async executeReencode(job: any) {
    const { inputPath, outputPath, targetBitrate, preset, crf } = job.params;

    const inputFull = path.join(process.env.MEDIA_LIBRARY_PATH!, inputPath);
    const outputFull = path.join(process.env.MEDIA_LIBRARY_PATH!, outputPath);

    const command = `ffmpeg -i "${inputFull}" -c:v libx264 -preset ${preset} -crf ${crf} -maxrate ${targetBitrate}k -bufsize ${targetBitrate * 2}k -c:a aac -b:a 128k -movflags +faststart "${outputFull}"`;

    await this.appendLog(job.id, `Starting re-encode\nCommand: ${command}`);

    // Execute FFmpeg (simplified - real implementation uses spawn for progress parsing)
    await execAsync(command);

    await this.appendLog(job.id, 'Re-encode completed successfully');
  }

  private async appendLog(jobId: string, message: string) {
    const timestamp = new Date().toISOString();
    const logEntry = `[${timestamp}] ${message}`;

    await db
      .update(jobs)
      .set({
        log: sql`${jobs.log} || E'\n' || ${logEntry}`,
      })
      .where(eq(jobs.id, jobId));
  }
}

// Start worker
export const jobWorker = new JobWorkerService();
jobWorker.start();

Frontend: Jobs Page

// admin/src/pages/media/MediaJobsPage.tsx
import { Table, Tag, Progress, Button, Space, Select, message } from 'antd';
import { useEffect, useState } from 'react';
import { mediaApi } from '@/lib/media-api';

export default function MediaJobsPage() {
  const [jobs, setJobs] = useState([]);
  const [loading, setLoading] = useState(false);
  const [filter, setFilter] = useState({ status: undefined, type: undefined });
  const [polling, setPolling] = useState(true);

  const fetchJobs = async () => {
    setLoading(true);
    try {
      const { data } = await mediaApi.get('/api/media/jobs', {
        params: filter,
      });
      setJobs(data.data);
    } catch (error) {
      console.error('Failed to fetch jobs:', error);
    } finally {
      setLoading(false);
    }
  };

  useEffect(() => {
    fetchJobs();
  }, [filter]);

  // Poll for running jobs every 2 seconds
  useEffect(() => {
    if (!polling) return;

    const interval = setInterval(() => {
      const hasRunning = jobs.some((j: any) => j.status === 'running');
      if (hasRunning) {
        fetchJobs();
      }
    }, 2000);

    return () => clearInterval(interval);
  }, [polling, jobs]);

  const handleRetry = async (id: string) => {
    try {
      await mediaApi.post(`/api/media/jobs/${id}/retry`);
      message.success('Job queued for retry');
      fetchJobs();
    } catch (error) {
      message.error('Retry failed');
    }
  };

  const handleCancel = async (id: string) => {
    try {
      await mediaApi.post(`/api/media/jobs/${id}/cancel`);
      message.success('Job cancelled');
      fetchJobs();
    } catch (error) {
      message.error('Cancel failed');
    }
  };

  const statusColors: Record<string, string> = {
    pending: 'default',
    queued: 'blue',
    running: 'processing',
    completed: 'success',
    failed: 'error',
    cancelled: 'default',
    paused: 'warning',
  };

  const columns = [
    {
      title: 'Type',
      dataIndex: 'type',
      width: 150,
      render: (type: string) => <span style={{ fontFamily: 'monospace' }}>{type}</span>,
    },
    {
      title: 'Status',
      dataIndex: 'status',
      width: 100,
      render: (status: string) => <Tag color={statusColors[status]}>{status.toUpperCase()}</Tag>,
    },
    {
      title: 'Progress',
      dataIndex: 'progress',
      width: 150,
      render: (progress: number, record: any) => (
        record.status === 'running' ? (
          <Progress percent={progress} size="small" status="active" />
        ) : record.status === 'completed' ? (
          <Progress percent={100} size="small" status="success" />
        ) : record.status === 'failed' ? (
          <Progress percent={progress} size="small" status="exception" />
        ) : (
          <Progress percent={progress} size="small" />
        )
      ),
    },
    {
      title: 'Resource',
      dataIndex: 'resourceCategory',
      width: 120,
    },
    {
      title: 'Priority',
      dataIndex: 'priority',
      width: 80,
      render: (priority: number) => (
        <Tag color={priority <= 3 ? 'red' : priority <= 6 ? 'orange' : 'default'}>
          {priority}
        </Tag>
      ),
    },
    {
      title: 'Created',
      dataIndex: 'createdAt',
      width: 150,
      render: (date: string) => new Date(date).toLocaleString(),
    },
    {
      title: 'Actions',
      width: 200,
      render: (_: any, record: any) => (
        <Space>
          {record.status === 'failed' && (
            <Button size="small" onClick={() => handleRetry(record.id)}>
              Retry
            </Button>
          )}
          {['pending', 'queued', 'running'].includes(record.status) && (
            <Button size="small" danger onClick={() => handleCancel(record.id)}>
              Cancel
            </Button>
          )}
          <Button size="small" onClick={() => window.open(`/app/media/jobs/${record.id}`, '_blank')}>
            View Log
          </Button>
        </Space>
      ),
    },
  ];

  return (
    <div>
      <Space style={{ marginBottom: 16 }}>
        <Select
          placeholder="Filter by status"
          style={{ width: 150 }}
          onChange={(value) => setFilter({ ...filter, status: value })}
          allowClear
        >
          <Select.Option value="pending">Pending</Select.Option>
          <Select.Option value="running">Running</Select.Option>
          <Select.Option value="completed">Completed</Select.Option>
          <Select.Option value="failed">Failed</Select.Option>
        </Select>

        <Select
          placeholder="Filter by type"
          style={{ width: 200 }}
          onChange={(value) => setFilter({ ...filter, type: value })}
          allowClear
        >
          <Select.Option value="scan">Scan</Select.Option>
          <Select.Option value="reencode_streaming">Re-encode</Select.Option>
          <Select.Option value="compile_random">Compilation</Select.Option>
        </Select>

        <Button onClick={() => setPolling(!polling)}>
          {polling ? 'Stop Auto-Refresh' : 'Start Auto-Refresh'}
        </Button>
      </Space>

      <Table
        columns={columns}
        dataSource={jobs}
        loading={loading}
        rowKey="id"
        pagination={{ pageSize: 20 }}
      />
    </div>
  );
}

Troubleshooting

Problem: Jobs Stuck in Pending

Symptoms:

  • Jobs created but never start
  • Status remains "pending" for hours
  • No "running" jobs visible

Solutions:

  1. Check worker process running:
docker compose ps media-api
# Should show "Up" status

docker compose logs media-api | grep "Job worker"
# Should show "Job worker started"
  1. Manually trigger worker:
# Restart media-api container
docker compose restart media-api

# Worker starts automatically on container boot
  1. Check worker logs for errors:
docker compose logs -f media-api | grep ERROR
# Look for database connection errors, permission issues
  1. Verify database connection:
# Test database accessible from container
docker compose exec media-api psql $DATABASE_URL -c "SELECT COUNT(*) FROM jobs WHERE status='pending';"

Problem: Job Fails Immediately

Symptoms:

  • Job status changes from pending → running → failed within seconds
  • No meaningful progress
  • Error in log: "Command not found" or "Permission denied"

Solutions:

  1. Check job log in database:
SELECT log FROM jobs WHERE id = 'JOB_ID';
  1. Verify FFmpeg installed:
docker compose exec media-api which ffmpeg
# Should output: /usr/bin/ffmpeg

docker compose exec media-api ffmpeg -version
  1. Check file paths valid:
# Verify input file exists
docker compose exec media-api ls -la /media/local/library/inbox/original.mp4

# Check output directory writable
docker compose exec media-api touch /media/local/playback/test.txt
  1. Test FFmpeg command manually:
# Copy command from job log, run manually
docker compose exec media-api ffmpeg -i /media/local/inbox/test.mp4 -c:v libx264 /media/local/playback/test-output.mp4

Problem: Re-encode Job Hangs at Same Progress

Symptoms:

  • Job progress reaches 25%, 50%, or 75% then stops updating
  • Status remains "running" for hours
  • No CPU/GPU activity visible

Solutions:

  1. Check FFmpeg process still running:
docker compose exec media-api ps aux | grep ffmpeg
# Should show ffmpeg process

# If not running, worker crashed
docker compose logs media-api --tail 100
  1. Kill hung FFmpeg process:
docker compose exec media-api pkill -9 ffmpeg

# Job will fail and can be retried
  1. Check disk space:
df -h /media/local/playback
# If 100% full, encoding fails

# Free space
docker compose exec media-api rm /media/local/playback/*.partial
  1. Increase FFmpeg timeout (if very large file):
// api/src/modules/media/services/job-worker.service.ts
const FFMPEG_TIMEOUT = 3600000; // 1 hour (from 30 minutes)

Problem: GPU Out of Memory Errors

Symptoms:

  • Multiple GPU jobs running simultaneously
  • Error in log: "CUDA out of memory" or "Cannot allocate memory"
  • System becomes unresponsive

Solutions:

  1. Check total VRAM available:
nvidia-smi
# Shows GPU memory usage

# Should show < 16GB used (adjust based on your GPU)
  1. Reduce concurrent GPU job limit:
// api/src/modules/media/services/job-worker.service.ts
const limits = {
  cpu: 5,
  gpu_encode: 1,  // Reduced from 2
  gpu_ai: 1,
};
  1. Increase VRAM requirements for jobs:
// Jobs require more VRAM than specified
// Update job creation to use higher vramRequired values
{
  type: 'reencode_streaming',
  vramRequired: 6000,  // Increased from 4000
}
  1. Kill running GPU jobs:
# Stop all media jobs
docker compose exec media-api pkill -9 ffmpeg

# Update stuck jobs to failed status
docker compose exec v2-postgres psql -U changemaker -d v2_changemaker \
  -c "UPDATE jobs SET status='failed' WHERE status='running';"

Performance Considerations

Job Queue Throughput

Scaling Factors:

  • CPU jobs: 5 concurrent = ~10-20 jobs/minute (scans, validations)
  • GPU encode: 2 concurrent = ~4-8 videos/hour (depends on length)
  • GPU AI: 1 concurrent = ~2-6 videos/hour (depends on complexity)

Bottlenecks:

  1. GPU Memory — Limits concurrent GPU jobs
  2. Disk I/O — Reading/writing large video files
  3. CPU — FFmpeg encoding uses all available cores

Optimization:

  • Distribute workers across multiple machines — Each machine runs separate worker process
  • Use job priority — Urgent jobs (priority 1-3) run first
  • Batch similar jobs — Group scan jobs, re-encode jobs, etc. for efficiency

Database Performance

Job Queue Index:

CREATE INDEX idx_jobs_status_priority ON jobs(status, priority, created_at);

Query Performance:

  • Find next pending job: ~1-5ms (with index)
  • Update job status: ~2-10ms
  • Fetch job logs: ~5-20ms

Optimization:

  • Partition jobs table by date — Move old completed/failed jobs to archive table
  • Limit log size — Truncate logs > 10KB to prevent bloat

Monitoring & Observability

Prometheus Metrics

// api/src/utils/metrics.ts
import { Counter, Gauge } from 'prom-client';

export const mediaJobsTotal = new Counter({
  name: 'media_jobs_total',
  help: 'Total media jobs created',
  labelNames: ['type', 'status'],
});

export const mediaJobsPending = new Gauge({
  name: 'media_jobs_pending',
  help: 'Number of pending media jobs',
});

export const mediaJobsRunning = new Gauge({
  name: 'media_jobs_running',
  help: 'Number of running media jobs',
  labelNames: ['resourceCategory'],
});

export const mediaVramUsed = new Gauge({
  name: 'media_vram_used_mb',
  help: 'Total VRAM used by running jobs (MB)',
});

// Update metrics in worker
mediaJobsPending.set(pendingCount);
mediaJobsRunning.set({ resourceCategory: 'gpu_encode' }, gpuEncodeCount);
mediaVramUsed.set(totalVramUsed);

Grafana Dashboard Panel

Job Queue Status:

# Pending jobs count
media_jobs_pending

# Running jobs by category
sum(media_jobs_running) by (resourceCategory)

# VRAM usage percentage
(media_vram_used_mb / 16000) * 100

Alert Rules:

# configs/prometheus/alerts.yml
groups:
  - name: media_jobs
    rules:
      - alert: MediaJobQueueBacklog
        expr: media_jobs_pending > 50
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "Media job queue backlog"
          description: "{{ $value }} jobs pending for 30+ minutes"

      - alert: MediaJobsStuckRunning
        expr: sum(media_jobs_running) == 0 AND media_jobs_pending > 0
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "Media jobs stuck"
          description: "Jobs pending but worker not processing"

Backend Documentation

  • Job Worker: backend/modules/media/job-worker.md — Worker process implementation
  • Job Processors: backend/modules/media/processors/ — Individual job type processors (reencode, scan, etc.)
  • Jobs Routes: backend/modules/media/jobs.md — API endpoints for job management

Frontend Documentation

  • Jobs Page: frontend/pages/media/jobs.md — Job queue monitoring UI
  • Job Detail Modal: frontend/components/media/job-detail.md — Log viewer component

Feature Documentation

  • Video Library: features/media/video-library.md — Triggering jobs from library actions
  • Upload System: features/media/upload.md — Post-upload job creation

Next Steps

After mastering the job queue:

  1. Create Custom Jobs — Implement new job types for domain-specific processing
  2. Optimize Scheduling — Tune resource limits and priority settings for your workload
  3. Monitor Performance — Set up Grafana dashboards and alerts for job queue health
  4. Distributed Workers — Scale horizontally by running workers on multiple machines

Hands-On Practice:

# 1. Create re-encode job
curl -X POST http://localhost:4100/api/media/jobs \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "reencode_streaming",
    "params": { "videoId": "VIDEO_ID", "targetBitrate": 2000 },
    "priority": 5
  }'

# 2. Monitor job progress
watch -n 2 'curl -s http://localhost:4100/api/media/jobs/JOB_ID | jq ".progress"'

# 3. View job logs
curl http://localhost:4100/api/media/jobs/JOB_ID | jq -r ".log"

# 4. Check queue stats
curl http://localhost:4100/api/media/jobs/stats | jq

Last Updated: 2026-02-13 Version: V2.0 Maintainer: Changemaker Lite Team