# Media Job Queue System ## Overview The Media Job Queue System provides asynchronous background processing for CPU and GPU-intensive video operations. Built on a custom job queue with resource-aware scheduling, it handles everything from directory scanning to AI-powered video analysis while maintaining system stability through resource category management. **Key Features:** - **Resource Categories** — Jobs classified by resource needs (CPU, GPU encode, GPU AI) - **Priority Scheduling** — High-priority jobs processed first within same category - **Job Types** — 15+ job types (compilation, encoding, digest generation, scene extraction, etc.) - **Progress Tracking** — Real-time progress updates (0-100%) - **Status Management** — Pending → Queued → Running → Completed/Failed lifecycle - **Retry Logic** — Failed jobs can be retried with exponential backoff - **Detailed Logging** — Execution logs for debugging and audit trail - **Queue Management** — Pause, resume, cancel, and prioritize jobs - **VRAM Awareness** — Prevents GPU memory exhaustion by tracking VRAM requirements **Access Control:** - Job viewing/management requires `SUPER_ADMIN` role - Job creation can be triggered by admins or automated workflows **Technology Stack:** - **Database Queue** — PostgreSQL-backed job queue (no BullMQ for media) - **Worker Process** — Node.js worker polling queue every 5 seconds - **FFmpeg** — Video encoding and compilation - **AI Integration** — Future support for scene detection and auto-tagging --- ## Architecture ```mermaid flowchart TB subgraph "Job Creation" A1[Admin Action] A2[Automated Trigger] A3[Scheduled Task] end subgraph "Job Queue (PostgreSQL)" Q1[Pending Jobs] Q2[Queued Jobs] Q3[Running Jobs] Q4[Completed/Failed Jobs] end subgraph "Worker Process" W1[Job Poller
Every 5s] W2[Resource Checker] W3[Job Executor] W4[Progress Updater] end subgraph "Processors" P1[CPU Jobs
scan, validate] P2[GPU Encode
reencode, compile] P3[GPU AI
digest, tag, scene] end subgraph "Results" R1[Video Records Updated] R2[New Files Created] R3[Logs Written] end A1 --> Q1 A2 --> Q1 A3 --> Q1 Q1 --> W1 W1 --> W2 W2 -->|Check Resources| Q2 Q2 --> W3 W3 --> P1 W3 --> P2 W3 --> P3 W3 --> W4 W4 --> Q3 P1 --> R1 P2 --> R2 P3 --> R3 Q3 --> Q4 style Q1 fill:#f9f style Q3 fill:#ff9 style Q4 fill:#9f9 ``` **Workflow:** 1. **Job Creation** — Admin clicks "Re-encode" button, API creates job record 2. **Queue Polling** — Worker checks for pending jobs every 5 seconds 3. **Resource Check** — Worker verifies sufficient VRAM/CPU available 4. **Job Execution** — Worker runs appropriate processor (FFmpeg, AI script, etc.) 5. **Progress Updates** — Worker updates job progress every ~5% completion 6. **Completion** — Worker marks job complete and logs results 7. **Retry on Failure** — Failed jobs can be retried with exponential backoff --- ## Database Model ### Jobs Table Schema ```typescript // api/src/modules/media/db/schema.ts export const jobs = pgTable('jobs', { id: uuid('id').primaryKey().defaultRandom(), // Job Definition type: text('type').notNull(), // JobType enum: compilation, scan, reencode, etc. status: text('status').notNull().default('pending'), // JobStatus enum params: jsonb('params').$type>().notNull(), // Job-specific parameters // Progress Tracking progress: integer('progress').default(0), // 0-100 log: text('log').default(''), // Execution log (append-only) // Scheduling priority: integer('priority').default(5), // 1 (highest) - 10 (lowest) queuePosition: integer('queue_position'), // Position in queue waitingReason: text('waiting_reason'), // Why job is waiting (e.g., "Insufficient VRAM") // Resource Management resourceCategory: text('resource_category').notNull(), // cpu|gpu_encode|gpu_ai vramRequired: integer('vram_required').default(0), // MB of VRAM needed // Timing createdAt: timestamp('created_at').defaultNow(), startedAt: timestamp('started_at'), completedAt: timestamp('completed_at'), // Retry Logic retryCount: integer('retry_count').default(0), maxRetries: integer('max_retries').default(3), retryAfter: timestamp('retry_after'), // Don't retry before this time }); ``` ### Job Types Enum | Type | Resource Category | VRAM (MB) | Description | |------|------------------|-----------|-------------| | `scan` | cpu | 0 | Scan directory for new videos | | `public_scan` | cpu | 0 | Scan public gallery directory | | `validate` | cpu | 0 | Validate video metadata (FFprobe) | | `reencode_streaming` | gpu_encode | 4000 | Re-encode for web playback (H.264) | | `compile_random` | gpu_encode | 2000 | Random video compilation | | `compile_quad` | gpu_encode | 4000 | 4-up grid compilation | | `compile_mega` | gpu_encode | 6000 | Large multi-video compilation | | `compile_gif` | cpu | 0 | Create GIF from video | | `digest_generate` | gpu_ai | 8000 | AI-powered video digest | | `clip_generate` | gpu_ai | 6000 | Extract clips from digest | | `highlight_generate` | gpu_ai | 8000 | Create highlight reel | | `tag_generation` | gpu_ai | 6000 | AI auto-tagging | | `scene_extract` | gpu_ai | 8000 | Scene detection and extraction | | `thumbnail_generate` | cpu | 0 | Generate thumbnail from video | | `move_to_library` | cpu | 0 | Move video from inbox to target directory | ### Job Status Enum | Status | Description | Final State | |--------|-------------|-------------| | `pending` | Waiting to be picked up by worker | No | | `queued` | Selected by worker, waiting for resources | No | | `running` | Currently executing | No | | `completed` | Finished successfully | Yes | | `failed` | Execution failed (see log for details) | Yes | | `cancelled` | Manually cancelled by admin | Yes | | `paused` | Temporarily paused (can be resumed) | No | ### Resource Categories | Category | Typical VRAM | Concurrent Limit | Use Cases | |----------|-------------|------------------|-----------| | `cpu` | 0 MB | 5 | Scanning, validation, simple encodes, GIF creation | | `gpu_encode` | 2-6 GB | 2 | Video re-encoding, compilation, format conversion | | `gpu_ai` | 6-12 GB | 1 | AI tagging, scene detection, digest generation, highlight extraction | **VRAM Management:** Worker tracks total VRAM usage across running jobs: ```typescript const runningJobs = await db.select().from(jobs).where(eq(jobs.status, 'running')); const totalVramUsed = runningJobs.reduce((sum, job) => sum + (job.vramRequired || 0), 0); // Only start new job if VRAM available const TOTAL_VRAM = 16000; // 16GB GPU if (totalVramUsed + newJob.vramRequired <= TOTAL_VRAM) { startJob(newJob); } ``` --- ## API Endpoints All endpoints require `SUPER_ADMIN` role. ### List Jobs ```http GET /api/media/jobs ``` **Query Parameters:** | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `page` | number | 1 | Page number | | `limit` | number | 20 | Results per page | | `status` | string | - | Filter by status (pending, running, completed, failed) | | `type` | string | - | Filter by job type | | `resourceCategory` | string | - | Filter by resource category | **Response:** ```json { "data": [ { "id": "550e8400-e29b-41d4-a716-446655440000", "type": "reencode_streaming", "status": "running", "progress": 45, "resourceCategory": "gpu_encode", "vramRequired": 4000, "priority": 5, "params": { "videoId": "660e8400-e29b-41d4-a716-446655440001", "targetBitrate": 2000 }, "startedAt": "2026-02-13T10:30:00Z", "createdAt": "2026-02-13T10:25:00Z" } ], "pagination": { "page": 1, "limit": 20, "total": 156, "totalPages": 8 } } ``` --- ### Get Job Details ```http GET /api/media/jobs/:id ``` **Response:** ```json { "id": "550e8400-e29b-41d4-a716-446655440000", "type": "reencode_streaming", "status": "completed", "progress": 100, "log": "Starting re-encode...\nFFmpeg command: ffmpeg -i input.mp4 -c:v h264 -preset medium -crf 23 output.mp4\nProgress: 25%\nProgress: 50%\nProgress: 75%\nProgress: 100%\nCompleted successfully", "params": { "videoId": "660e8400-e29b-41d4-a716-446655440001", "inputPath": "inbox/original.mp4", "outputPath": "playback/encoded.mp4", "targetBitrate": 2000 }, "resourceCategory": "gpu_encode", "vramRequired": 4000, "priority": 5, "retryCount": 0, "maxRetries": 3, "createdAt": "2026-02-13T10:25:00Z", "startedAt": "2026-02-13T10:30:00Z", "completedAt": "2026-02-13T10:45:00Z" } ``` --- ### Create Job ```http POST /api/media/jobs ``` **Request Body:** ```json { "type": "reencode_streaming", "params": { "videoId": "660e8400-e29b-41d4-a716-446655440001", "targetBitrate": 2000 }, "priority": 5, "resourceCategory": "gpu_encode", "vramRequired": 4000 } ``` **Response:** ```json { "id": "770e8400-e29b-41d4-a716-446655440002", "type": "reencode_streaming", "status": "pending", "progress": 0, "createdAt": "2026-02-13T11:00:00Z" } ``` --- ### Retry Failed Job ```http POST /api/media/jobs/:id/retry ``` **Response:** ```json { "id": "550e8400-e29b-41d4-a716-446655440000", "status": "pending", "retryCount": 1, "retryAfter": null, "log": "Starting re-encode...\n[Previous logs...]\n--- RETRY ATTEMPT 1 ---\n" } ``` **Retry Logic:** - Failed jobs can be retried up to `maxRetries` times (default: 3) - Exponential backoff: wait `2^retryCount` minutes before retry - Retry resets status to `pending` and appends retry marker to log --- ### Cancel Job ```http POST /api/media/jobs/:id/cancel ``` **Response:** ```json { "id": "550e8400-e29b-41d4-a716-446655440000", "status": "cancelled", "log": "Starting re-encode...\nProgress: 25%\n--- JOB CANCELLED BY ADMIN ---" } ``` **Notes:** - Running jobs cannot be cancelled immediately (worker must finish current chunk) - Pending/queued jobs cancelled instantly --- ### Pause/Resume Job ```http POST /api/media/jobs/:id/pause POST /api/media/jobs/:id/resume ``` **Pause Response:** ```json { "id": "550e8400-e29b-41d4-a716-446655440000", "status": "paused" } ``` **Resume Response:** ```json { "id": "550e8400-e29b-41d4-a716-446655440000", "status": "pending" } ``` --- ### Queue Statistics ```http GET /api/media/jobs/stats ``` **Response:** ```json { "pending": 12, "queued": 2, "running": 3, "completed": 1458, "failed": 23, "paused": 1, "totalVramUsed": 12000, "totalVramAvailable": 16000, "averageProcessingTime": 245, "jobsByType": { "reencode_streaming": 45, "scan": 8, "compile_random": 12 } } ``` --- ## Admin Workflow ### Viewing Job Queue 1. Navigate to **Media → Jobs** in admin sidebar 2. Table displays all jobs with: - Job type icon - Status badge (color-coded) - Progress bar - Priority indicator - Resource category - Created/started/completed times 3. Use filters at top: - **Status** dropdown (All / Pending / Running / Completed / Failed) - **Type** dropdown (job type) - **Resource** dropdown (CPU / GPU Encode / GPU AI) ### Creating Jobs Manually **Option 1: From Library Page** 1. Select video in library table 2. Click **"Actions"** dropdown 3. Select action: - "Re-encode for Streaming" - "Generate Thumbnail" - "Validate Metadata" - "Move to Directory" 4. Confirm job creation 5. Redirected to Jobs page showing new job **Option 2: From Jobs Page** 1. Click **"Create Job"** button 2. Modal opens with form: - **Type** dropdown (15+ job types) - **Video** selector (search by title/filename) - **Priority** slider (1-10) - **Parameters** JSON editor (advanced) 3. Click **"Create"** 4. Job appears in pending queue ### Monitoring Job Progress **Real-Time Updates:** 1. Jobs page polls API every 2 seconds for running jobs 2. Progress bars update smoothly (0-100%) 3. Status badges change color: - Grey: Pending - Blue: Queued - Yellow: Running - Green: Completed - Red: Failed **Detailed Logs:** 1. Click job row to expand details panel 2. View execution log in monospace text area 3. Log updates in real-time while job running 4. Example log output: ``` [2026-02-13 10:30:15] Starting re-encode job [2026-02-13 10:30:16] Input: /media/local/inbox/original.mp4 [2026-02-13 10:30:16] Output: /media/local/playback/encoded.mp4 [2026-02-13 10:30:17] FFmpeg command: ffmpeg -i /media/local/inbox/original.mp4 -c:v libx264 -preset medium -crf 23 -c:a aac -b:a 128k /media/local/playback/encoded.mp4 [2026-02-13 10:30:20] Progress: 5% [2026-02-13 10:30:25] Progress: 15% [2026-02-13 10:30:30] Progress: 25% ... [2026-02-13 10:45:00] Progress: 100% [2026-02-13 10:45:01] Re-encode completed successfully [2026-02-13 10:45:02] Output file size: 25.3 MB ``` ### Retrying Failed Jobs 1. Filter for **Failed** jobs 2. Click job row to view error log 3. Identify failure reason (e.g., "FFmpeg error: codec not supported") 4. Fix underlying issue (install codec, fix file path, etc.) 5. Click **"Retry"** button 6. Job resets to pending status 7. Worker picks up job again **Auto-Retry:** Jobs automatically retry up to 3 times with exponential backoff: - 1st retry: after 2 minutes - 2nd retry: after 4 minutes - 3rd retry: after 8 minutes ### Cancelling Jobs 1. Find job in pending/queued/running state 2. Click **"Cancel"** button 3. Confirm cancellation dialog 4. Job marked as cancelled 5. If running, worker stops after current chunk completes ### Pausing/Resuming Jobs **Use Case:** Temporarily stop low-priority jobs to free resources for urgent tasks 1. Select low-priority pending job 2. Click **"Pause"** button 3. Job status changes to paused (greyed out) 4. Worker skips paused jobs 5. When ready, click **"Resume"** 6. Job returns to pending queue --- ## Job Type Details ### Scan Jobs (`scan`, `public_scan`) **Purpose:** Scan filesystem directory for new videos and create database records **Parameters:** ```json { "directoryType": "videos", "skipExisting": true } ``` **Process:** 1. Read directory `/media/local/library/{directoryType}/` 2. Filter for video extensions (`.mp4`, `.mov`, etc.) 3. Check each file against database (by path) 4. Create records for new files 5. Run FFprobe on new files 6. Update progress: files processed / total files **Typical Duration:** 2-30 seconds (depends on file count) --- ### Validation Jobs (`validate`) **Purpose:** Re-run FFprobe to refresh video metadata **Parameters:** ```json { "videoId": "660e8400-e29b-41d4-a716-446655440001" } ``` **Process:** 1. Fetch video record from database 2. Build full file path 3. Run FFprobe extraction 4. Update database with fresh metadata 5. Mark video as valid/invalid based on result **Typical Duration:** 100-500ms per video --- ### Re-encode Jobs (`reencode_streaming`) **Purpose:** Convert video to web-optimized format (H.264, web-friendly profile) **Parameters:** ```json { "videoId": "660e8400-e29b-41d4-a716-446655440001", "targetBitrate": 2000, "preset": "medium", "crf": 23 } ``` **FFmpeg Command:** ```bash ffmpeg -i /media/local/inbox/original.mp4 \ -c:v libx264 \ -preset medium \ -crf 23 \ -maxrate 2000k \ -bufsize 4000k \ -c:a aac \ -b:a 128k \ -movflags +faststart \ /media/local/playback/encoded.mp4 ``` **Process:** 1. Validate input file exists 2. Build FFmpeg command 3. Start encoding process 4. Parse FFmpeg progress output 5. Update job progress every ~5% 6. Create new video record for encoded file 7. Update original video `reencodeJobId` reference **Typical Duration:** 5-30 minutes (depends on video length and resolution) --- ### Compilation Jobs (`compile_random`, `compile_quad`, `compile_mega`) **Purpose:** Merge multiple videos into single compilation **Parameters (Random):** ```json { "count": 10, "minDuration": 30, "maxDuration": 120, "orientation": "landscape", "outputPath": "compilations/random-001.mp4" } ``` **Process:** 1. Query database for videos matching criteria (orientation, duration range) 2. Randomly select `count` videos 3. Build FFmpeg concat demuxer file list 4. Run FFmpeg compilation 5. Create new video record for compilation 6. Update progress based on FFmpeg output **Quad Compilation (4-up grid):** ```bash ffmpeg -i video1.mp4 -i video2.mp4 -i video3.mp4 -i video4.mp4 \ -filter_complex "[0:v][1:v][2:v][3:v]xstack=inputs=4:layout=0_0|w0_0|0_h0|w0_h0[v]" \ -map "[v]" \ output.mp4 ``` **Typical Duration:** 10-60 minutes --- ### Digest Generation (`digest_generate`) **Purpose:** AI-powered video digest creation (future feature) **Parameters:** ```json { "videoId": "660e8400-e29b-41d4-a716-446655440001", "targetLength": 60, "includeHighlights": true } ``` **Process (Planned):** 1. Extract frames at 1 FPS 2. Run AI scene detection 3. Identify highlights (action, faces, motion) 4. Select best segments totaling target length 5. Compile segments into digest video **GPU AI Required:** 8GB VRAM --- ### Thumbnail Generation (`thumbnail_generate`) **Purpose:** Extract thumbnail image from video **Parameters:** ```json { "videoId": "660e8400-e29b-41d4-a716-446655440001", "timestamp": 5, "width": 640 } ``` **FFmpeg Command:** ```bash ffmpeg -i /media/local/library/videos/sample.mp4 \ -ss 00:00:05 \ -vframes 1 \ -vf scale=640:-1 \ /media/local/thumbnails/sample.jpg ``` **Process:** 1. Seek to timestamp (default: 25% into video) 2. Extract single frame 3. Scale to width (preserve aspect ratio) 4. Save as JPEG 5. Update video record with `thumbnailPath` **Typical Duration:** 1-5 seconds --- ## Code Examples ### Create Re-encode Job ```typescript // api/src/modules/media/routes/jobs.routes.ts import { db } from '@/modules/media/db'; import { jobs, videos } from '@/modules/media/db/schema'; app.post('/api/media/jobs/reencode', async (req, reply) => { const { videoId, targetBitrate = 2000, preset = 'medium', crf = 23 } = req.body; // Fetch video const [video] = await db .select() .from(videos) .where(eq(videos.id, videoId)) .limit(1); if (!video) { return reply.code(404).send({ error: 'Video not found' }); } // Create job const [job] = await db .insert(jobs) .values({ type: 'reencode_streaming', status: 'pending', params: { videoId, inputPath: video.path, outputPath: `playback/${video.filename}`, targetBitrate, preset, crf, }, resourceCategory: 'gpu_encode', vramRequired: 4000, priority: 5, }) .returning(); reply.send(job); }); ``` --- ### Job Worker (Polling Loop) ```typescript // api/src/modules/media/services/job-worker.service.ts import { db } from '@/modules/media/db'; import { jobs } from '@/modules/media/db/schema'; import { eq, and, lte } from 'drizzle-orm'; export class JobWorkerService { private polling = false; async start() { this.polling = true; console.log('Job worker started'); while (this.polling) { try { await this.processNextJob(); } catch (error) { console.error('Job worker error:', error); } // Wait 5 seconds before next poll await new Promise((resolve) => setTimeout(resolve, 5000)); } } async stop() { this.polling = false; console.log('Job worker stopped'); } private async processNextJob() { // Find next pending job (highest priority first) const [job] = await db .select() .from(jobs) .where(eq(jobs.status, 'pending')) .orderBy(jobs.priority, jobs.createdAt) .limit(1); if (!job) { return; // No jobs in queue } // Check resource availability const canRun = await this.checkResources(job); if (!canRun) { // Update waiting reason await db .update(jobs) .set({ waitingReason: 'Insufficient resources' }) .where(eq(jobs.id, job.id)); return; } // Start job await this.executeJob(job); } private async checkResources(job: any): Promise { // Get running jobs const runningJobs = await db .select() .from(jobs) .where(eq(jobs.status, 'running')); // Calculate total VRAM used const totalVramUsed = runningJobs.reduce( (sum, j) => sum + (j.vramRequired || 0), 0 ); const TOTAL_VRAM = 16000; // 16GB GPU const available = TOTAL_VRAM - totalVramUsed; if (job.vramRequired && job.vramRequired > available) { return false; // Not enough VRAM } // Check concurrent job limits by category const categoryCount = runningJobs.filter( (j) => j.resourceCategory === job.resourceCategory ).length; const limits = { cpu: 5, gpu_encode: 2, gpu_ai: 1, }; if (categoryCount >= limits[job.resourceCategory as keyof typeof limits]) { return false; // Category limit reached } return true; // Resources available } private async executeJob(job: any) { // Mark as running await db .update(jobs) .set({ status: 'running', startedAt: new Date(), waitingReason: null, }) .where(eq(jobs.id, job.id)); try { // Execute job based on type switch (job.type) { case 'reencode_streaming': await this.executeReencode(job); break; case 'scan': await this.executeScan(job); break; case 'thumbnail_generate': await this.executeThumbnail(job); break; // ... other job types } // Mark as completed await db .update(jobs) .set({ status: 'completed', progress: 100, completedAt: new Date(), }) .where(eq(jobs.id, job.id)); } catch (error: any) { // Mark as failed await db .update(jobs) .set({ status: 'failed', log: (job.log || '') + `\n\n--- ERROR ---\n${error.message}`, }) .where(eq(jobs.id, job.id)); // Schedule retry if under max retries if (job.retryCount < job.maxRetries) { const retryDelay = Math.pow(2, job.retryCount) * 60 * 1000; // Exponential backoff await db .update(jobs) .set({ status: 'pending', retryCount: job.retryCount + 1, retryAfter: new Date(Date.now() + retryDelay), }) .where(eq(jobs.id, job.id)); } } } private async executeReencode(job: any) { const { inputPath, outputPath, targetBitrate, preset, crf } = job.params; const inputFull = path.join(process.env.MEDIA_LIBRARY_PATH!, inputPath); const outputFull = path.join(process.env.MEDIA_LIBRARY_PATH!, outputPath); const command = `ffmpeg -i "${inputFull}" -c:v libx264 -preset ${preset} -crf ${crf} -maxrate ${targetBitrate}k -bufsize ${targetBitrate * 2}k -c:a aac -b:a 128k -movflags +faststart "${outputFull}"`; await this.appendLog(job.id, `Starting re-encode\nCommand: ${command}`); // Execute FFmpeg (simplified - real implementation uses spawn for progress parsing) await execAsync(command); await this.appendLog(job.id, 'Re-encode completed successfully'); } private async appendLog(jobId: string, message: string) { const timestamp = new Date().toISOString(); const logEntry = `[${timestamp}] ${message}`; await db .update(jobs) .set({ log: sql`${jobs.log} || E'\n' || ${logEntry}`, }) .where(eq(jobs.id, jobId)); } } // Start worker export const jobWorker = new JobWorkerService(); jobWorker.start(); ``` --- ### Frontend: Jobs Page ```typescript // admin/src/pages/media/MediaJobsPage.tsx import { Table, Tag, Progress, Button, Space, Select, message } from 'antd'; import { useEffect, useState } from 'react'; import { mediaApi } from '@/lib/media-api'; export default function MediaJobsPage() { const [jobs, setJobs] = useState([]); const [loading, setLoading] = useState(false); const [filter, setFilter] = useState({ status: undefined, type: undefined }); const [polling, setPolling] = useState(true); const fetchJobs = async () => { setLoading(true); try { const { data } = await mediaApi.get('/api/media/jobs', { params: filter, }); setJobs(data.data); } catch (error) { console.error('Failed to fetch jobs:', error); } finally { setLoading(false); } }; useEffect(() => { fetchJobs(); }, [filter]); // Poll for running jobs every 2 seconds useEffect(() => { if (!polling) return; const interval = setInterval(() => { const hasRunning = jobs.some((j: any) => j.status === 'running'); if (hasRunning) { fetchJobs(); } }, 2000); return () => clearInterval(interval); }, [polling, jobs]); const handleRetry = async (id: string) => { try { await mediaApi.post(`/api/media/jobs/${id}/retry`); message.success('Job queued for retry'); fetchJobs(); } catch (error) { message.error('Retry failed'); } }; const handleCancel = async (id: string) => { try { await mediaApi.post(`/api/media/jobs/${id}/cancel`); message.success('Job cancelled'); fetchJobs(); } catch (error) { message.error('Cancel failed'); } }; const statusColors: Record = { pending: 'default', queued: 'blue', running: 'processing', completed: 'success', failed: 'error', cancelled: 'default', paused: 'warning', }; const columns = [ { title: 'Type', dataIndex: 'type', width: 150, render: (type: string) => {type}, }, { title: 'Status', dataIndex: 'status', width: 100, render: (status: string) => {status.toUpperCase()}, }, { title: 'Progress', dataIndex: 'progress', width: 150, render: (progress: number, record: any) => ( record.status === 'running' ? ( ) : record.status === 'completed' ? ( ) : record.status === 'failed' ? ( ) : ( ) ), }, { title: 'Resource', dataIndex: 'resourceCategory', width: 120, }, { title: 'Priority', dataIndex: 'priority', width: 80, render: (priority: number) => ( {priority} ), }, { title: 'Created', dataIndex: 'createdAt', width: 150, render: (date: string) => new Date(date).toLocaleString(), }, { title: 'Actions', width: 200, render: (_: any, record: any) => ( {record.status === 'failed' && ( )} {['pending', 'queued', 'running'].includes(record.status) && ( )} ), }, ]; return (
); } ``` --- ## Troubleshooting ### Problem: Jobs Stuck in Pending **Symptoms:** - Jobs created but never start - Status remains "pending" for hours - No "running" jobs visible **Solutions:** 1. **Check worker process running:** ```bash docker compose ps media-api # Should show "Up" status docker compose logs media-api | grep "Job worker" # Should show "Job worker started" ``` 2. **Manually trigger worker:** ```bash # Restart media-api container docker compose restart media-api # Worker starts automatically on container boot ``` 3. **Check worker logs for errors:** ```bash docker compose logs -f media-api | grep ERROR # Look for database connection errors, permission issues ``` 4. **Verify database connection:** ```bash # Test database accessible from container docker compose exec media-api psql $DATABASE_URL -c "SELECT COUNT(*) FROM jobs WHERE status='pending';" ``` --- ### Problem: Job Fails Immediately **Symptoms:** - Job status changes from pending → running → failed within seconds - No meaningful progress - Error in log: "Command not found" or "Permission denied" **Solutions:** 1. **Check job log in database:** ```sql SELECT log FROM jobs WHERE id = 'JOB_ID'; ``` 2. **Verify FFmpeg installed:** ```bash docker compose exec media-api which ffmpeg # Should output: /usr/bin/ffmpeg docker compose exec media-api ffmpeg -version ``` 3. **Check file paths valid:** ```bash # Verify input file exists docker compose exec media-api ls -la /media/local/library/inbox/original.mp4 # Check output directory writable docker compose exec media-api touch /media/local/playback/test.txt ``` 4. **Test FFmpeg command manually:** ```bash # Copy command from job log, run manually docker compose exec media-api ffmpeg -i /media/local/inbox/test.mp4 -c:v libx264 /media/local/playback/test-output.mp4 ``` --- ### Problem: Re-encode Job Hangs at Same Progress **Symptoms:** - Job progress reaches 25%, 50%, or 75% then stops updating - Status remains "running" for hours - No CPU/GPU activity visible **Solutions:** 1. **Check FFmpeg process still running:** ```bash docker compose exec media-api ps aux | grep ffmpeg # Should show ffmpeg process # If not running, worker crashed docker compose logs media-api --tail 100 ``` 2. **Kill hung FFmpeg process:** ```bash docker compose exec media-api pkill -9 ffmpeg # Job will fail and can be retried ``` 3. **Check disk space:** ```bash df -h /media/local/playback # If 100% full, encoding fails # Free space docker compose exec media-api rm /media/local/playback/*.partial ``` 4. **Increase FFmpeg timeout (if very large file):** ```typescript // api/src/modules/media/services/job-worker.service.ts const FFMPEG_TIMEOUT = 3600000; // 1 hour (from 30 minutes) ``` --- ### Problem: GPU Out of Memory Errors **Symptoms:** - Multiple GPU jobs running simultaneously - Error in log: "CUDA out of memory" or "Cannot allocate memory" - System becomes unresponsive **Solutions:** 1. **Check total VRAM available:** ```bash nvidia-smi # Shows GPU memory usage # Should show < 16GB used (adjust based on your GPU) ``` 2. **Reduce concurrent GPU job limit:** ```typescript // api/src/modules/media/services/job-worker.service.ts const limits = { cpu: 5, gpu_encode: 1, // Reduced from 2 gpu_ai: 1, }; ``` 3. **Increase VRAM requirements for jobs:** ```typescript // Jobs require more VRAM than specified // Update job creation to use higher vramRequired values { type: 'reencode_streaming', vramRequired: 6000, // Increased from 4000 } ``` 4. **Kill running GPU jobs:** ```bash # Stop all media jobs docker compose exec media-api pkill -9 ffmpeg # Update stuck jobs to failed status docker compose exec v2-postgres psql -U changemaker -d v2_changemaker \ -c "UPDATE jobs SET status='failed' WHERE status='running';" ``` --- ## Performance Considerations ### Job Queue Throughput **Scaling Factors:** - CPU jobs: 5 concurrent = ~10-20 jobs/minute (scans, validations) - GPU encode: 2 concurrent = ~4-8 videos/hour (depends on length) - GPU AI: 1 concurrent = ~2-6 videos/hour (depends on complexity) **Bottlenecks:** 1. **GPU Memory** — Limits concurrent GPU jobs 2. **Disk I/O** — Reading/writing large video files 3. **CPU** — FFmpeg encoding uses all available cores **Optimization:** - **Distribute workers across multiple machines** — Each machine runs separate worker process - **Use job priority** — Urgent jobs (priority 1-3) run first - **Batch similar jobs** — Group scan jobs, re-encode jobs, etc. for efficiency --- ### Database Performance **Job Queue Index:** ```sql CREATE INDEX idx_jobs_status_priority ON jobs(status, priority, created_at); ``` **Query Performance:** - Find next pending job: ~1-5ms (with index) - Update job status: ~2-10ms - Fetch job logs: ~5-20ms **Optimization:** - **Partition jobs table by date** — Move old completed/failed jobs to archive table - **Limit log size** — Truncate logs > 10KB to prevent bloat --- ## Monitoring & Observability ### Prometheus Metrics ```typescript // api/src/utils/metrics.ts import { Counter, Gauge } from 'prom-client'; export const mediaJobsTotal = new Counter({ name: 'media_jobs_total', help: 'Total media jobs created', labelNames: ['type', 'status'], }); export const mediaJobsPending = new Gauge({ name: 'media_jobs_pending', help: 'Number of pending media jobs', }); export const mediaJobsRunning = new Gauge({ name: 'media_jobs_running', help: 'Number of running media jobs', labelNames: ['resourceCategory'], }); export const mediaVramUsed = new Gauge({ name: 'media_vram_used_mb', help: 'Total VRAM used by running jobs (MB)', }); // Update metrics in worker mediaJobsPending.set(pendingCount); mediaJobsRunning.set({ resourceCategory: 'gpu_encode' }, gpuEncodeCount); mediaVramUsed.set(totalVramUsed); ``` ### Grafana Dashboard Panel **Job Queue Status:** ```promql # Pending jobs count media_jobs_pending # Running jobs by category sum(media_jobs_running) by (resourceCategory) # VRAM usage percentage (media_vram_used_mb / 16000) * 100 ``` **Alert Rules:** ```yaml # configs/prometheus/alerts.yml groups: - name: media_jobs rules: - alert: MediaJobQueueBacklog expr: media_jobs_pending > 50 for: 30m labels: severity: warning annotations: summary: "Media job queue backlog" description: "{{ $value }} jobs pending for 30+ minutes" - alert: MediaJobsStuckRunning expr: sum(media_jobs_running) == 0 AND media_jobs_pending > 0 for: 10m labels: severity: critical annotations: summary: "Media jobs stuck" description: "Jobs pending but worker not processing" ``` --- ## Related Documentation ### Backend Documentation - **Job Worker:** `backend/modules/media/job-worker.md` — Worker process implementation - **Job Processors:** `backend/modules/media/processors/` — Individual job type processors (reencode, scan, etc.) - **Jobs Routes:** `backend/modules/media/jobs.md` — API endpoints for job management ### Frontend Documentation - **Jobs Page:** `frontend/pages/media/jobs.md` — Job queue monitoring UI - **Job Detail Modal:** `frontend/components/media/job-detail.md` — Log viewer component ### Feature Documentation - **Video Library:** `features/media/video-library.md` — Triggering jobs from library actions - **Upload System:** `features/media/upload.md` — Post-upload job creation --- ## Next Steps After mastering the job queue: 1. **Create Custom Jobs** — Implement new job types for domain-specific processing 2. **Optimize Scheduling** — Tune resource limits and priority settings for your workload 3. **Monitor Performance** — Set up Grafana dashboards and alerts for job queue health 4. **Distributed Workers** — Scale horizontally by running workers on multiple machines **Hands-On Practice:** ```bash # 1. Create re-encode job curl -X POST http://localhost:4100/api/media/jobs \ -H "Authorization: Bearer YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "type": "reencode_streaming", "params": { "videoId": "VIDEO_ID", "targetBitrate": 2000 }, "priority": 5 }' # 2. Monitor job progress watch -n 2 'curl -s http://localhost:4100/api/media/jobs/JOB_ID | jq ".progress"' # 3. View job logs curl http://localhost:4100/api/media/jobs/JOB_ID | jq -r ".log" # 4. Check queue stats curl http://localhost:4100/api/media/jobs/stats | jq ``` --- **Last Updated:** 2026-02-13 **Version:** V2.0 **Maintainer:** Changemaker Lite Team