23 KiB
Raw Blame History

ObservabilityPage

Overview

File: admin/src/pages/ObservabilityPage.tsx Route: /app/observability Role Requirements: SUPER_ADMIN

ObservabilityPage is the system monitoring and alerting dashboard for Changemaker Lite's observability stack. It provides a unified interface for viewing Prometheus metrics, Grafana dashboards, and Alertmanager alerts. The page features three tabs (Overview, Monitoring, Alerts), service status monitoring for 7 monitoring services, key metrics grid, active alerts table, and embedded iframes for Grafana and Alertmanager with lazy loading.

The page integrates with:

  • Prometheus (port 9090) - Metrics collection and time-series database
  • Grafana (port 3001) - Metrics visualization and dashboards
  • Alertmanager (port 9093) - Alert management and routing
  • cAdvisor (port 8080) - Container metrics
  • Node Exporter (port 9100) - Host system metrics
  • Redis Exporter (port 9121) - Redis metrics
  • Gotify (port 8889) - Notification service

Key Features:

  • Three-tab interface (Overview/Monitoring/Alerts) with radio button switcher
  • Service status cards (7 services) with online/offline indicators
  • Metrics grid showing key application metrics (API uptime, queue size, sessions, etc.)
  • Active alerts table with severity indicators
  • Lazy-loaded Grafana iframe (Application Overview dashboard)
  • Lazy-loaded Alertmanager iframe
  • Auto-start banner for offline services
  • "Open Grafana" button for full-screen access

Key Components:

  • ServiceStatusCard for each monitoring service
  • MetricsGrid for application metrics
  • AlertsTable for active alerts
  • IframeErrorBoundary for iframe error handling
  • Radio.Group for tab switching

Screenshot

[Screenshot: ObservabilityPage showing three-tab interface at top (Overview/Monitoring/Alerts radio buttons), Overview tab displaying service status cards in grid (Prometheus, Grafana, Alertmanager, cAdvisor, Node Exporter, Redis Exporter, Gotify) with green/red online/offline indicators, key metrics grid below showing API stats, and active alerts table at bottom. Header has Refresh and "Open Grafana" buttons.]


Features

Core Features

  1. Three-Tab Interface

    • Overview Tab: Service status + metrics + alerts summary
    • Monitoring Tab: Embedded Grafana Application Overview dashboard
    • Alerts Tab: Embedded Alertmanager UI
    • Radio button switcher in page header
    • Tab state preserved during session
  2. Service Status Monitoring

    • 7 service status cards:
      • Prometheus - Metrics database
      • Grafana - Dashboard visualization
      • Alertmanager - Alert management
      • cAdvisor - Container metrics
      • Node Exporter - Host metrics
      • Redis Exporter - Redis metrics
      • Gotify - Notification service
    • Online/offline badge indicators
    • Clickable URL to open service in new tab
    • Responsive grid layout (4 columns on desktop, 2 on tablet, 1 on mobile)
  3. Auto-Start Banner

    • Warning alert at top of Overview tab when all services offline
    • Shows Docker Compose command to start monitoring services
    • Command: docker compose --profile monitoring up -d
    • Only shows when servicesOnline === 0
  4. Key Metrics Grid

    • Displays application-specific metrics from Prometheus
    • Examples: API uptime, email queue size, active canvass sessions, total locations
    • Only visible when at least one service online
    • Powered by MetricsGrid component
  5. Active Alerts Table

    • Shows currently firing alerts from Alertmanager
    • Columns: Alert name, severity, status, start time
    • Color-coded severity (critical=red, warning=orange, info=blue)
    • Only visible when at least one service online
    • Powered by AlertsTable component
  6. Grafana Dashboard Iframe

    • Embedded Application Overview dashboard
    • Lazy-loaded (only loads when Monitoring tab selected)
    • Full-height iframe (calc(100vh - 200px))
    • Sandboxed for security (allow-scripts, allow-same-origin, allow-forms)
    • Error boundary for graceful failure handling
    • Shows warning if Grafana offline
  7. Alertmanager Iframe

    • Embedded Alertmanager UI
    • Lazy-loaded (only loads when Alerts tab selected)
    • Full-height iframe (calc(100vh - 200px))
    • Sandboxed for security
    • Error boundary for graceful failure handling
    • Shows warning if Alertmanager offline
  8. Refresh Button

    • Refreshes all data (status, metrics, alerts) in parallel
    • Visible in all tabs
    • Loading state during refresh
  9. Open Grafana Button

    • Primary button in header (blue)
    • Opens Grafana in new tab at full URL
    • Only visible when Grafana online
    • Provides full-screen Grafana access

User Workflow

Viewing System Status (Overview Tab)

  1. Navigate to page: Admin sidebar → System → Observability
  2. Overview tab loads: Shows service status cards, metrics grid, alerts table
  3. Check service status: Green badges = online, red badges = offline
  4. Review metrics: Scan key application metrics (uptime, queue size, etc.)
  5. Check alerts: Review active alerts table for firing alerts

Starting Monitoring Services

If all services offline:

  1. See warning banner: Yellow alert at top with Docker Compose command
  2. Copy command: docker compose --profile monitoring up -d
  3. Run in terminal: Execute command in project directory
  4. Wait ~30 seconds: Services take time to start
  5. Click Refresh: Reload page to verify services online
  6. Banner disappears: Warning banner no longer shown

Viewing Grafana Dashboards

  1. Click "Monitoring" tab: Radio button in header
  2. Grafana iframe loads: Embedded Application Overview dashboard
  3. Interact with dashboard: Pan, zoom, change time range, etc.
  4. Full-screen access: Click "Open Grafana" button for new tab
  5. Explore more dashboards: In Grafana UI, browse other dashboards (Host Metrics, Docker Containers, etc.)

Managing Alerts

  1. Click "Alerts" tab: Radio button in header
  2. Alertmanager iframe loads: Embedded alert management UI
  3. View alert groups: See all firing alerts grouped by label
  4. Silence alerts: Click Silence button to temporarily suppress
  5. Configure routes: Modify alert routing rules (if SUPER_ADMIN)

Refreshing Data

  1. Click Refresh button: In header (any tab)
  2. All data reloads: Service status, metrics, alerts fetched in parallel
  3. Loading state: Brief spinner or loading indicator
  4. Data updates: New status/metrics/alerts displayed

Opening Service Directly

  1. Click on service status card URL (if service online)
  2. New tab opens: Direct access to service (e.g., Prometheus, Grafana, Alertmanager)
  3. Full service UI: No iframe restrictions, full functionality

Component Breakdown

Tab Switcher (Header)

<Radio.Group
  value={activeTab}
  onChange={e => setActiveTab(e.target.value)}
  buttonStyle="solid"
>
  <Radio.Button value="overview">
    <DashboardOutlined /> Overview
  </Radio.Button>
  <Radio.Button value="monitoring">
    <LineChartOutlined /> Monitoring
  </Radio.Button>
  <Radio.Button value="alerts">
    <AlertOutlined /> Alerts
  </Radio.Button>
</Radio.Group>

Solid button style: Active tab highlighted with blue background.

Service Status Card

<ServiceStatusCard
  name="Prometheus"
  online={status?.prometheus?.online || false}
  url={status?.prometheus?.url || ''}
  icon={<DashboardOutlined />}
/>

ServiceStatusCard Component:

interface ServiceStatusCardProps {
  name: string;
  online: boolean;
  url: string;
  icon: React.ReactNode;
}

// Displays:
// - Service name (bold)
// - Badge (green "Online" or red "Offline")
// - Icon
// - Clickable link to service URL (if online)

Auto-Start Banner

{allOffline && (
  <Alert
    message="Monitoring services are offline"
    description={
      <>
        Start monitoring services with: <code>docker compose --profile monitoring up -d</code>
      </>
    }
    type="warning"
    showIcon
    style={{ marginBottom: 16 }}
  />
)}

Condition: allOffline = servicesOnline === 0

Service Status Grid

<Card title="Service Status" style={{ marginBottom: 16 }}>
  <Row gutter={[16, 16]}>
    <Col xs={24} sm={12} lg={6}>
      <ServiceStatusCard name="Prometheus" online={...} url={...} icon={<DashboardOutlined />} />
    </Col>
    <Col xs={24} sm={12} lg={6}>
      <ServiceStatusCard name="Grafana" online={...} url={...} icon={<LineChartOutlined />} />
    </Col>
    {/* 5 more cards... */}
  </Row>
</Card>

Responsive Grid:

  • Desktop (lg, ≥ 992px): 4 columns (6/24 = 25% width each)
  • Tablet (sm, ≥ 576px): 2 columns (12/24 = 50% width each)
  • Mobile (xs, < 576px): 1 column (24/24 = 100% width)

Metrics Grid

{!allOffline && <MetricsGrid metrics={metrics} loading={loading} />}

MetricsGrid Component:

  • Displays application metrics from Prometheus
  • Examples: API uptime, email queue size, active sessions, location count
  • Styled as grid of Statistic cards
  • Only renders when at least one service online

Alerts Table

{!allOffline && alerts && (
  <AlertsTable alerts={alerts.alerts || []} loading={loading} />
)}

AlertsTable Component:

  • Ant Design Table with columns:
    • Alert name
    • Severity (color-coded tag)
    • Status (firing/resolved)
    • Start time (relative)
  • Pagination if > 10 alerts
  • Only renders when at least one service online

Grafana Iframe (Monitoring Tab)

<IframeErrorBoundary serviceName="Grafana">
  <Card styles={{ body: { padding: 0 } }}>
    {grafanaIframeSrc ? (
      <iframe
        src={grafanaIframeSrc}
        style={{
          width: '100%',
          height: 'calc(100vh - 200px)',
          border: 'none',
        }}
        title="Grafana Dashboard"
        aria-label="Embedded Grafana application overview dashboard"
        sandbox="allow-scripts allow-same-origin allow-forms"
        referrerPolicy="strict-origin-when-cross-origin"
        loading="lazy"
      />
    ) : (
      <Spin />
    )}
  </Card>
</IframeErrorBoundary>

Lazy Loading Logic:

useEffect(() => {
  if (activeTab === 'monitoring' && !grafanaInitialized.current && status?.grafana.online) {
    try {
      const url = buildMonitoringUrl('grafana', 3005, '/d/application-overview');
      setGrafanaIframeSrc(url);
      grafanaInitialized.current = true;
    } catch (error) {
      console.error('Failed to construct Grafana URL:', error);
    }
  }
}, [activeTab, status]);

Pattern: Iframe src set only when:

  1. Monitoring tab selected
  2. Not already initialized (ref tracks this)
  3. Grafana is online

Alertmanager Iframe (Alerts Tab)

<IframeErrorBoundary serviceName="Alertmanager">
  <Card styles={{ body: { padding: 0 } }}>
    {alertmanagerIframeSrc ? (
      <iframe
        src={alertmanagerIframeSrc}
        style={{
          width: '100%',
          height: 'calc(100vh - 200px)',
          border: 'none',
        }}
        title="Alertmanager"
        aria-label="Embedded Alertmanager alert management interface"
        sandbox="allow-scripts allow-same-origin allow-forms"
        referrerPolicy="strict-origin-when-cross-origin"
        loading="lazy"
      />
    ) : (
      <Spin />
    )}
  </Card>
</IframeErrorBoundary>

Same lazy loading pattern as Grafana.


State Management

Local State

Data State:

const [status, setStatus] = useState<ObservabilityStatus | null>(null);
const [metrics, setMetrics] = useState<MetricsSummary | null>(null);
const [alerts, setAlerts] = useState<AlertsResponse | null>(null);
const [loading, setLoading] = useState(true);

UI State:

const [activeTab, setActiveTab] = useState<TabKey>('overview');
const [grafanaIframeSrc, setGrafanaIframeSrc] = useState<string | null>(null);
const [alertmanagerIframeSrc, setAlertmanagerIframeSrc] = useState<string | null>(null);
const grafanaInitialized = useRef(false);
const alertmanagerInitialized = useRef(false);

Data Fetching

Fetch Status:

const fetchStatus = useCallback(async () => {
  try {
    const res = await api.get<ObservabilityStatus>('/observability/status');
    setStatus(res.data);
  } catch {
    // Status fetch failed — leave null
  }
}, []);

Fetch Metrics:

const fetchMetrics = useCallback(async () => {
  try {
    const res = await api.get<MetricsSummary>('/observability/metrics-summary');
    setMetrics(res.data);
  } catch {
    // Metrics fetch may fail if Prometheus is offline
  }
}, []);

Fetch Alerts:

const fetchAlerts = useCallback(async () => {
  try {
    const res = await api.get<AlertsResponse>('/observability/alerts');
    setAlerts(res.data);
  } catch {
    // Alerts fetch may fail if Alertmanager is offline
  }
}, []);

Fetch All (Parallel):

const fetchAll = useCallback(async () => {
  setLoading(true);
  await Promise.all([fetchStatus(), fetchMetrics(), fetchAlerts()]);
  setLoading(false);
}, [fetchStatus, fetchMetrics, fetchAlerts]);

Benefit: Parallel API calls load faster than sequential.

Lazy Iframe Loading

useEffect(() => {
  if (activeTab === 'monitoring' && !grafanaInitialized.current && status?.grafana.online) {
    try {
      const url = buildMonitoringUrl('grafana', 3005, '/d/application-overview');
      setGrafanaIframeSrc(url);
      grafanaInitialized.current = true;
    } catch (error) {
      console.error('Failed to construct Grafana URL:', error);
    }
  }
}, [activeTab, status]);

Why Lazy Loading?

  • Avoids loading heavy iframes until needed
  • Improves initial page load performance
  • Saves bandwidth if user never clicks Monitoring/Alerts tabs

Why useRef?

  • Tracks initialization state without triggering re-renders
  • Prevents redundant iframe loads on subsequent tab switches

API Integration

Endpoints Used

GET /observability/status - Fetch service online/offline status

const { data } = await api.get<ObservabilityStatus>('/observability/status');

Response:

{
  "prometheus": {
    "online": true,
    "url": "http://localhost:9090"
  },
  "grafana": {
    "online": true,
    "url": "http://localhost:3001"
  },
  "alertmanager": {
    "online": true,
    "url": "http://localhost:9093"
  },
  "cadvisor": {
    "online": true,
    "url": "http://localhost:8080"
  },
  "nodeExporter": {
    "online": true,
    "url": "http://localhost:9100"
  },
  "redisExporter": {
    "online": true,
    "url": "http://localhost:9121"
  },
  "gotify": {
    "online": false,
    "url": "http://localhost:8889"
  }
}

GET /observability/metrics-summary - Fetch key application metrics

const { data } = await api.get<MetricsSummary>('/observability/metrics-summary');

Response:

{
  "apiUptime": 99.8,
  "emailQueueSize": 42,
  "activeCanvassSessions": 5,
  "totalLocations": 12543,
  "httpRequestsTotal": 156789,
  "httpRequestDurationSeconds": 0.234
}

GET /observability/alerts - Fetch active alerts

const { data } = await api.get<AlertsResponse>('/observability/alerts');

Response:

{
  "alerts": [
    {
      "id": "alert_1",
      "name": "HighMemoryUsage",
      "severity": "warning",
      "status": "firing",
      "startTime": "2026-02-11T10:30:00Z",
      "labels": {
        "alertname": "HighMemoryUsage",
        "instance": "api:4000",
        "severity": "warning"
      },
      "annotations": {
        "summary": "Memory usage above 80%",
        "description": "API container using 85% memory"
      }
    }
  ]
}

Code Examples

Parallel API Calls

const fetchAll = useCallback(async () => {
  setLoading(true);
  await Promise.all([fetchStatus(), fetchMetrics(), fetchAlerts()]);
  setLoading(false);
}, [fetchStatus, fetchMetrics, fetchAlerts]);

Benefit: Loads all data simultaneously (faster than sequential).

Lazy Iframe Loading Pattern

const grafanaInitialized = useRef(false);

useEffect(() => {
  if (activeTab === 'monitoring' && !grafanaInitialized.current && status?.grafana.online) {
    const url = buildMonitoringUrl('grafana', 3005, '/d/application-overview');
    setGrafanaIframeSrc(url);
    grafanaInitialized.current = true;
  }
}, [activeTab, status]);

Pattern:

  1. Check if tab active
  2. Check if not already initialized (useRef)
  3. Check if service online
  4. Build URL and set iframe src
  5. Mark as initialized (prevents redundant loads)

Services Online Count

const servicesOnline = status
  ? Object.values(status).filter((s: ServiceStatus) => s.online).length
  : 0;
const allOffline = servicesOnline === 0;

Counts online services from status object values.

Conditional Rendering Based on Service Status

{allOffline && (
  <Alert
    message="Monitoring services are offline"
    description={<>Start with: <code>docker compose --profile monitoring up -d</code></>}
    type="warning"
  />
)}

{!allOffline && <MetricsGrid metrics={metrics} loading={loading} />}
{!allOffline && alerts && <AlertsTable alerts={alerts.alerts || []} loading={loading} />}

Pattern: Show banner if all offline, hide metrics/alerts if all offline.


Performance Considerations

Parallel API Calls

Three API calls made simultaneously instead of sequentially:

await Promise.all([fetchStatus(), fetchMetrics(), fetchAlerts()]);

Benefit: Reduces total load time from ~300ms (100ms × 3) to ~100ms (max of 3 parallel requests).

Lazy Iframe Loading

Iframes only load when tab selected:

  • Grafana iframe: activeTab === 'monitoring'
  • Alertmanager iframe: activeTab === 'alerts'

Benefit: Saves bandwidth and reduces initial page load time. Heavy iframes (~1-2MB each) not loaded unless needed.

useRef for Initialization Tracking

const grafanaInitialized = useRef(false);

Why useRef instead of useState?

  • Doesn't trigger re-renders when updated
  • Persists across re-renders
  • Perfect for tracking initialization state

Conditional Component Rendering

{!allOffline && <MetricsGrid metrics={metrics} loading={loading} />}

Avoids rendering heavy components when no services online (no data to show).


Responsive Design

Service Status Grid

<Row gutter={[16, 16]}>
  <Col xs={24} sm={12} lg={6}>
    <ServiceStatusCard ... />
  </Col>
  {/* 6 more cards... */}
</Row>

Responsive Breakpoints:

  • Desktop (lg, ≥ 992px): 4 columns (6/24 each)
  • Tablet (sm, ≥ 576px): 2 columns (12/24 each)
  • Mobile (xs, < 576px): 1 column (24/24 each)

Iframe Height

<iframe style={{ height: 'calc(100vh - 200px)' }} />

Dynamic height: Fills viewport minus header/footer (responsive to window resize).


Accessibility

Iframe Labels

<iframe
  title="Grafana Dashboard"
  aria-label="Embedded Grafana application overview dashboard"
/>

Screen reader support: Clear description of iframe content.

Button Labels

<Button icon={<ReloadOutlined />}>Refresh</Button>
<Button icon={<LinkOutlined />}>Open Grafana</Button>

Not icon-only buttons text labels for clarity.

Service Status Badges

<Badge status="success" text="Online" />
<Badge status="error" text="Offline" />

Color + text: Not relying on color alone for status indication.


Troubleshooting

All Services Offline

Symptoms:

  • Warning banner at top
  • All service status cards show red "Offline"
  • No metrics or alerts displayed

Cause: Monitoring services not started (Docker Compose profile monitoring not active)

Solution:

# Start monitoring services
docker compose --profile monitoring up -d

# Verify services running
docker compose ps | grep -E "(prometheus|grafana|alertmanager)"

# Check logs if services fail to start
docker compose logs prometheus grafana alertmanager

Grafana/Alertmanager Iframe Not Loading

Symptoms:

  • Blank iframe or loading spinner forever
  • Console errors about iframe src

Causes:

  1. Service offline (check Overview tab status)
  2. CORS policy blocking iframe
  3. Network error

Debug:

# Check Grafana container
docker compose logs grafana

# Test Grafana directly
curl http://localhost:3001

# Check nginx proxy (if using)
docker compose logs nginx | grep grafana

Metrics Not Showing

Symptoms:

  • MetricsGrid empty or shows zeros
  • "Failed to load metrics" error

Cause: Prometheus offline or not scraping metrics

Solutions:

# Check Prometheus status
curl http://localhost:9090/-/healthy

# Check Prometheus targets (should show API as "up")
curl http://localhost:9090/api/v1/targets

# Verify API is exposing /metrics endpoint
curl http://localhost:4000/metrics

Alerts Not Showing

Symptoms:

  • AlertsTable empty
  • No alerts firing (but should be)

Causes:

  1. Alertmanager offline
  2. No alerts configured in Prometheus
  3. Alerts resolved (not firing)

Debug:

# Check Alertmanager status
curl http://localhost:9093/-/healthy

# Check Prometheus alerts
curl http://localhost:9090/api/v1/alerts

# Check alert rules config
docker compose exec api cat /app/configs/prometheus/alerts.yml

"Open Grafana" Button Not Visible

Cause: Grafana offline

Expected Behavior:

{status?.grafana.online && (
  <Button href={status.grafana.url} target="_blank">
    Open Grafana
  </Button>
)}

Button only shows when Grafana online.


Backend Integration

Features

Deployment

Troubleshooting

User Guides

External Resources

Frontend Components