Data Quality Dashboard¶
Overview¶
The Data Quality Dashboard provides comprehensive monitoring and management of geocoding accuracy and location data integrity. This feature enables campaign administrators to identify and resolve data quality issues, track geocoding provider performance, and ensure reliable map data for canvassing operations.
Key Features:
- Real-time geocoding quality metrics
- Provider success rate tracking
- Low-confidence location detection
- Duplicate location identification
- Bulk re-geocoding operations
- Address validation reporting
- Interactive quality charts
- Export quality reports
Use Cases:
- Monthly data quality audits
- NAR import validation
- Geocoding provider evaluation
- Pre-canvass data verification
- Address database cleanup
- Campaign planning accuracy checks
Architecture Highlights:
- Aggregate statistics via database queries
- Confidence threshold filtering (0-100 scale)
- Provider performance comparison
- Duplicate detection via coordinate matching
- Manual review workflows
- Prometheus metrics integration
Architecture¶
flowchart TB
subgraph Admin Interface
Admin[Admin User]
Dashboard[DataQualityDashboardPage]
LocationsPage[LocationsPage]
end
subgraph API Layer
StatsAPI["/api/locations/geocode-stats"]
LocationsAPI["/api/locations"]
DuplicatesAPI["/api/locations/duplicates"]
RegeocodeAPI["/api/locations/:id/regeocode"]
BulkGeocodeAPI["/api/locations/bulk-geocode"]
end
subgraph Database
LocationsDB[(Locations)]
Indexes[(Indexes)]
end
subgraph Geocoding Service
GeocodingService[GeocodingService]
Providers[6 Providers]
Cache[Redis Cache]
end
subgraph Monitoring
Prometheus[Prometheus]
Metrics[cm_locations_low_confidence_count]
end
Admin --> Dashboard
Admin --> LocationsPage
Dashboard --> StatsAPI
Dashboard --> LocationsAPI
Dashboard --> DuplicatesAPI
LocationsPage --> RegeocodeAPI
LocationsPage --> BulkGeocodeAPI
StatsAPI --> LocationsDB
LocationsAPI --> LocationsDB
DuplicatesAPI --> LocationsDB
RegeocodeAPI --> GeocodingService
BulkGeocodeAPI --> GeocodingService
LocationsDB --> Indexes
GeocodingService --> Providers
GeocodingService --> Cache
StatsAPI --> Prometheus
Prometheus --> Metrics
Data Flow:
- Statistics Aggregation:
- Query all locations with geocoding metadata
- Calculate aggregate metrics (total, geocoded %, avg confidence)
- Group by provider for success rate comparison
- Identify low-confidence locations (< 50)
-
Detect duplicates via coordinate matching
-
Quality Review:
- Admin views dashboard statistics
- Filters low-confidence locations
- Reviews individual location details
-
Identifies patterns (provider failures, address format issues)
-
Remediation:
- Manual address correction
- Single location re-geocoding
- Bulk re-geocoding with different provider
-
Duplicate merging or marking
-
Monitoring:
- Prometheus metrics track quality trends
- Alert rules trigger for quality degradation
- Grafana dashboards visualize provider performance
Database Models¶
Location Model¶
model Location {
id Int @id @default(autoincrement())
address String
latitude Float?
longitude Float?
postalCode String?
province String?
// Geocoding metadata
geocodeConfidence Int? // 0-100 quality score
geocodeProvider String? // Provider used for geocoding
geocodedAt DateTime? // Timestamp of last geocode
// NAR import fields
locGuid String? @unique
federalDistrict String?
buildingUse Int? // 1 = Residential
addresses Address[]
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@index([geocodeConfidence])
@@index([geocodeProvider])
@@index([latitude, longitude])
@@index([latitude, longitude], where: latitude IS NOT NULL AND longitude IS NOT NULL)
}
Geocode Confidence Scale: - 0-20: Very Low (manual review required) - 21-40: Low (likely incorrect, re-geocode recommended) - 41-60: Medium (acceptable but consider verification) - 61-80: Good (likely accurate) - 81-100: Excellent (high confidence)
Geocode Provider Enum:
enum GeocodeProvider {
GOOGLE = 'GOOGLE',
MAPBOX = 'MAPBOX',
NOMINATIM = 'NOMINATIM',
PHOTON = 'PHOTON',
LOCATIONIQ = 'LOCATIONIQ',
ARCGIS = 'ARCGIS',
UNKNOWN = 'UNKNOWN'
}
Address Model¶
model Address {
id Int @id @default(autoincrement())
locationId Int
location Location @relation(fields: [locationId], references: [id], onDelete: Cascade)
unitNumber String?
firstName String?
lastName String?
supportLevel Int?
notes String?
// Address validation
isValidated Boolean @default(false)
validatedAt DateTime?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@index([locationId])
}
API Endpoints¶
GET /api/locations/geocode-stats¶
Fetch aggregate geocoding quality statistics.
Authentication: Required (SUPER_ADMIN, MAP_ADMIN)
Response:
{
"total": 1500,
"geocoded": 1450,
"geocodedPercent": 96.67,
"avgConfidence": 78.5,
"providerBreakdown": {
"GOOGLE": 800,
"MAPBOX": 350,
"NOMINATIM": 200,
"PHOTON": 100,
"ARCGIS": 0,
"LOCATIONIQ": 0,
"UNKNOWN": 50
},
"confidenceDistribution": {
"0-20": 15,
"21-40": 35,
"41-60": 150,
"61-80": 450,
"81-100": 800
},
"lowConfidenceCount": 50,
"missingCoordinates": 50,
"duplicatesCount": 12
}
Implementation:
// locations.service.ts
async getGeocodeStats() {
const locations = await prisma.location.findMany({
select: {
latitude: true,
longitude: true,
geocodeConfidence: true,
geocodeProvider: true
}
});
const total = locations.length;
const geocoded = locations.filter(l => l.latitude && l.longitude).length;
const avgConfidence = locations.reduce((sum, l) =>
sum + (l.geocodeConfidence || 0), 0) / total;
const providerBreakdown = locations.reduce((acc, l) => {
const provider = l.geocodeProvider || 'UNKNOWN';
acc[provider] = (acc[provider] || 0) + 1;
return acc;
}, {} as Record<string, number>);
const confidenceDistribution = {
'0-20': 0,
'21-40': 0,
'41-60': 0,
'61-80': 0,
'81-100': 0
};
locations.forEach(l => {
const conf = l.geocodeConfidence || 0;
if (conf <= 20) confidenceDistribution['0-20']++;
else if (conf <= 40) confidenceDistribution['21-40']++;
else if (conf <= 60) confidenceDistribution['41-60']++;
else if (conf <= 80) confidenceDistribution['61-80']++;
else confidenceDistribution['81-100']++;
});
const lowConfidenceCount = locations.filter(l =>
(l.geocodeConfidence || 0) < 50).length;
return {
total,
geocoded,
geocodedPercent: (geocoded / total) * 100,
avgConfidence,
providerBreakdown,
confidenceDistribution,
lowConfidenceCount,
missingCoordinates: total - geocoded,
duplicatesCount: await this.countDuplicates()
};
}
GET /api/locations?geocodeConfidence=lt:50¶
Fetch locations filtered by geocode confidence.
Authentication: Required
Query Parameters:
- geocodeConfidence (filter): lt:X, gt:X, eq:X, null
- geocodeProvider (filter): Provider name (GOOGLE, MAPBOX, etc.)
- page (optional): Page number (default: 1)
- limit (optional): Results per page (default: 50)
- sortBy (optional): Field to sort by (default: "geocodeConfidence")
- order (optional): "asc" or "desc" (default: "asc")
Examples:
GET /api/locations?geocodeConfidence=lt:50
GET /api/locations?geocodeConfidence=null
GET /api/locations?geocodeProvider=NOMINATIM&geocodeConfidence=lt:70
GET /api/locations?geocodeConfidence=gt:80&sortBy=address
Response:
{
"data": [
{
"id": 1001,
"address": "123 Main St",
"latitude": 43.6532,
"longitude": -79.3832,
"postalCode": "M5H 2N2",
"geocodeConfidence": 45,
"geocodeProvider": "NOMINATIM",
"geocodedAt": "2025-02-10T10:00:00Z",
"addresses": [...]
}
],
"pagination": {
"page": 1,
"limit": 50,
"total": 150,
"pages": 3
}
}
GET /api/locations/duplicates¶
Identify locations with identical coordinates.
Authentication: Required (SUPER_ADMIN, MAP_ADMIN)
Query Parameters:
- threshold (optional): Distance threshold in meters (default: 1, matches exact duplicates)
Response:
{
"duplicates": [
{
"coordinates": {
"latitude": 43.6532,
"longitude": -79.3832
},
"count": 3,
"locations": [
{
"id": 1001,
"address": "123 Main St",
"postalCode": "M5H 2N2"
},
{
"id": 1002,
"address": "123 Main Street",
"postalCode": "M5H 2N2"
},
{
"id": 1003,
"address": "123 Main St, Unit 1",
"postalCode": "M5H 2N2"
}
]
}
],
"total": 12
}
Implementation:
// locations.service.ts
async findDuplicates(thresholdMeters: number = 1) {
const locations = await prisma.location.findMany({
where: {
AND: [
{ latitude: { not: null } },
{ longitude: { not: null } }
]
},
select: {
id: true,
address: true,
latitude: true,
longitude: true,
postalCode: true
}
});
const coordMap = new Map<string, typeof locations>();
locations.forEach(loc => {
// Round to 6 decimal places (~0.1m precision)
const key = `${loc.latitude!.toFixed(6)},${loc.longitude!.toFixed(6)}`;
if (!coordMap.has(key)) {
coordMap.set(key, []);
}
coordMap.get(key)!.push(loc);
});
const duplicates = Array.from(coordMap.entries())
.filter(([_, locs]) => locs.length > 1)
.map(([coords, locs]) => {
const [lat, lng] = coords.split(',').map(Number);
return {
coordinates: { latitude: lat, longitude: lng },
count: locs.length,
locations: locs
};
});
return {
duplicates,
total: duplicates.reduce((sum, dup) => sum + dup.count, 0)
};
}
POST /api/locations/:id/regeocode¶
Re-geocode a single location with specified provider.
Authentication: Required (SUPER_ADMIN, MAP_ADMIN)
Request Body:
Parameters:
- provider (optional): Specific provider to use (default: fallback chain)
- address (optional): Override address string (default: use existing)
Response:
{
"id": 1001,
"address": "123 Main St",
"latitude": 43.6532,
"longitude": -79.3832,
"geocodeConfidence": 95,
"geocodeProvider": "GOOGLE",
"geocodedAt": "2025-02-13T10:30:00Z"
}
POST /api/locations/bulk-geocode¶
Bulk re-geocode multiple locations.
Authentication: Required (SUPER_ADMIN, MAP_ADMIN)
Request Body:
Parameters:
- locationIds (optional): Specific location IDs (default: all with confidence < threshold)
- provider (optional): Specific provider to use (default: fallback chain)
- confidenceThreshold (optional): Only re-geocode locations below this confidence (default: 50)
Response:
{
"jobId": "bulk-geocode-20250213-103000",
"status": "queued",
"total": 150,
"message": "Bulk geocoding job started"
}
Job Progress Endpoint:
Job Status Response:
{
"jobId": "bulk-geocode-20250213-103000",
"status": "processing",
"progress": {
"total": 150,
"processed": 75,
"successful": 70,
"failed": 5,
"percent": 50
},
"startedAt": "2025-02-13T10:30:00Z",
"estimatedCompletion": "2025-02-13T10:35:00Z"
}
Configuration¶
Environment Variables¶
| Variable | Type | Default | Description |
|---|---|---|---|
| GEOCODE_CONFIDENCE_THRESHOLD | number | 50 | Minimum confidence for acceptable geocoding |
| GEOCODE_PRIMARY_PROVIDER | string | Primary geocoding provider | |
| GEOCODE_FALLBACK_PROVIDERS | string | MAPBOX,NOMINATIM | Comma-separated fallback providers |
| GEOCODE_CACHE_TTL | number | 2592000 | Cache TTL in seconds (30 days) |
Quality Thresholds¶
| Metric | Warning | Critical | Description |
|---|---|---|---|
| Geocoded % | < 95% | < 90% | Percentage of locations with coordinates |
| Avg Confidence | < 70 | < 60 | Average geocode confidence score |
| Low Confidence Count | > 50 | > 100 | Locations with confidence < 50 |
| Duplicates | > 20 | > 50 | Locations with identical coordinates |
| Missing Coordinates | > 5% | > 10% | Locations without lat/lng |
Prometheus Metrics¶
Custom Metrics:
// api/src/utils/metrics.ts
export const geocodingQualityGauge = new Gauge({
name: 'cm_geocoding_avg_confidence',
help: 'Average geocoding confidence score (0-100)',
async collect() {
const stats = await locationsService.getGeocodeStats();
this.set(stats.avgConfidence);
}
});
export const lowConfidenceLocationsGauge = new Gauge({
name: 'cm_locations_low_confidence_count',
help: 'Number of locations with geocode confidence < 50',
async collect() {
const stats = await locationsService.getGeocodeStats();
this.set(stats.lowConfidenceCount);
}
});
export const geocodedPercentGauge = new Gauge({
name: 'cm_locations_geocoded_percent',
help: 'Percentage of locations with coordinates',
async collect() {
const stats = await locationsService.getGeocodeStats();
this.set(stats.geocodedPercent);
}
});
export const duplicateLocationsGauge = new Gauge({
name: 'cm_locations_duplicates_count',
help: 'Number of duplicate location entries',
async collect() {
const duplicates = await locationsService.findDuplicates();
this.set(duplicates.total);
}
});
Alert Rules:
# configs/prometheus/alerts.yml
groups:
- name: data_quality
interval: 5m
rules:
- alert: LowGeocodingConfidence
expr: cm_geocoding_avg_confidence < 60
for: 10m
labels:
severity: warning
annotations:
summary: Low average geocoding confidence
description: "Average geocoding confidence is {{ $value }}, below threshold of 60"
- alert: HighLowConfidenceLocations
expr: cm_locations_low_confidence_count > 100
for: 5m
labels:
severity: critical
annotations:
summary: High number of low-confidence locations
description: "{{ $value }} locations have geocoding confidence < 50"
- alert: LowGeocodedPercent
expr: cm_locations_geocoded_percent < 90
for: 10m
labels:
severity: warning
annotations:
summary: Low percentage of geocoded locations
description: "Only {{ $value }}% of locations have coordinates"
- alert: HighDuplicateLocations
expr: cm_locations_duplicates_count > 50
for: 15m
labels:
severity: warning
annotations:
summary: High number of duplicate locations
description: "{{ $value }} duplicate location entries detected"
Quality Metrics¶
Geocoding Confidence¶
Calculation:
Geocoding confidence is calculated based on multiple factors:
interface GeocodeResult {
latitude: number;
longitude: number;
matchType: 'exact' | 'interpolated' | 'approximate' | 'fallback';
addressComponents: {
streetNumber?: string;
street?: string;
city?: string;
postalCode?: string;
province?: string;
};
providerConfidence?: number; // Provider-specific score
}
function calculateConfidence(result: GeocodeResult, inputAddress: string): number {
let confidence = 0;
// Match type (0-40 points)
switch (result.matchType) {
case 'exact': confidence += 40; break;
case 'interpolated': confidence += 30; break;
case 'approximate': confidence += 20; break;
case 'fallback': confidence += 10; break;
}
// Address component completeness (0-30 points)
const components = result.addressComponents;
if (components.streetNumber) confidence += 10;
if (components.street) confidence += 10;
if (components.postalCode) confidence += 10;
// Provider-specific confidence (0-30 points)
if (result.providerConfidence) {
confidence += (result.providerConfidence / 100) * 30;
}
return Math.min(Math.round(confidence), 100);
}
Confidence Levels:
- 81-100 (Excellent): Exact match with full address components
- 61-80 (Good): Interpolated match with most components
- 41-60 (Medium): Approximate match, missing some components
- 21-40 (Low): Fallback geocoding, significant uncertainty
- 0-20 (Very Low): Minimal match, likely incorrect
Provider Success Rates¶
Metrics Tracked:
interface ProviderMetrics {
provider: GeocodeProvider;
totalAttempts: number;
successfulGeocodes: number;
successRate: number; // 0-100%
avgConfidence: number; // 0-100
avgResponseTime: number; // milliseconds
errorCount: number;
lastError?: string;
}
Success Rate Calculation:
const calculateProviderMetrics = async (): Promise<ProviderMetrics[]> => {
const locations = await prisma.location.findMany({
select: {
geocodeProvider: true,
geocodeConfidence: true,
latitude: true,
longitude: true
}
});
const providerGroups = groupBy(locations, 'geocodeProvider');
return Object.entries(providerGroups).map(([provider, locs]) => {
const total = locs.length;
const successful = locs.filter(l => l.latitude && l.longitude).length;
const avgConf = locs.reduce((sum, l) => sum + (l.geocodeConfidence || 0), 0) / total;
return {
provider: provider as GeocodeProvider,
totalAttempts: total,
successfulGeocodes: successful,
successRate: (successful / total) * 100,
avgConfidence: avgConf,
avgResponseTime: 0, // Would need separate tracking
errorCount: total - successful
};
});
};
Duplicate Detection¶
Detection Methods:
-
Exact Coordinate Match:
-
Proximity Threshold:
-
Address Similarity:
import { distance as levenshteinDistance } from 'fastest-levenshtein'; const isDuplicateAddress = (addr1: string, addr2: string): boolean => { const normalized1 = normalizeAddress(addr1); const normalized2 = normalizeAddress(addr2); const dist = levenshteinDistance(normalized1, normalized2); const similarity = 1 - (dist / Math.max(normalized1.length, normalized2.length)); return similarity > 0.9; // 90% similar }; const normalizeAddress = (address: string): string => { return address .toLowerCase() .replace(/\bstreet\b/g, 'st') .replace(/\bavenue\b/g, 'ave') .replace(/\broad\b/g, 'rd') .replace(/\bdrive\b/g, 'dr') .replace(/[^a-z0-9]/g, ''); };
Address Validation¶
Validation Checks:
interface AddressValidationResult {
isValid: boolean;
issues: string[];
suggestions?: string[];
}
const validateAddress = (address: string): AddressValidationResult => {
const issues: string[] = [];
// Check minimum length
if (address.length < 5) {
issues.push('Address too short');
}
// Check for street number
if (!/^\d+/.test(address)) {
issues.push('Missing street number');
}
// Check for street name
if (!/\d+\s+([A-Za-z]+\s*)+/.test(address)) {
issues.push('Missing street name');
}
// Check for postal code (Canadian format)
if (!/[A-Z]\d[A-Z]\s?\d[A-Z]\d/.test(address)) {
issues.push('Missing or invalid postal code');
}
// Check for unusual characters
if (/[^A-Za-z0-9\s,.-]/.test(address)) {
issues.push('Contains unusual characters');
}
return {
isValid: issues.length === 0,
issues
};
};
Admin Workflow¶
Navigate to Data Quality Dashboard¶
Step 1: Access Dashboard
- Log in as SUPER_ADMIN or MAP_ADMIN
- Click Map in sidebar
- Click Data Quality submenu
- Dashboard loads with statistics
Step 2: Review Overall Statistics
Dashboard displays 4 main statistic cards:
┌──────────────────┬──────────────────┬──────────────────┬──────────────────┐
│ Total Locations │ Geocoded │ Avg Confidence │ Low Confidence │
│ 1,500 │ 1,450 (96.7%) │ 78.5 │ 50 │
└──────────────────┴──────────────────┴──────────────────┴──────────────────┘
Step 3: Analyze Provider Performance
Provider breakdown table shows:
| Provider | Count | Success Rate | Avg Confidence |
|---|---|---|---|
| 800 | 99.2% | 85.3 | |
| MAPBOX | 350 | 97.1% | 82.1 |
| NOMINATIM | 200 | 94.5% | 75.8 |
| PHOTON | 100 | 91.0% | 68.2 |
| UNKNOWN | 50 | N/A | 0 |
Step 4: Review Confidence Distribution
Bar chart displays confidence distribution:
Confidence Distribution
100 | ┌──────┐
80 | │ │
60 | ┌──────┤ │
40 | ┌──────┤ │ │
20 | │ │ │ │
0 └──┴──────┴──────┴──────┴──────┘
0-20 21-40 41-60 61-80 81-100
15 35 150 450 800
Identify and Review Low-Confidence Locations¶
Step 1: Filter Low-Confidence Locations
- Click Low Confidence tab on dashboard
- Table loads with locations where confidence < 50
- Sort by confidence (ascending) to prioritize worst
Step 2: Review Location Details
Click row to open detail drawer:
┌─────────────────────────────────────────┐
│ Location Details │
├─────────────────────────────────────────┤
│ Address: 123 Main St │
│ Postal Code: M5H 2N2 │
│ Coordinates: 43.6532, -79.3832 │
│ │
│ Geocoding Info: │
│ Confidence: 45 (Low) │
│ Provider: NOMINATIM │
│ Geocoded: Feb 10, 2025 10:00 AM │
│ │
│ Issues: │
│ • Missing street number in response │
│ • Approximate match only │
│ │
│ [Re-geocode] [Edit Address] [View Map] │
└─────────────────────────────────────────┘
Step 3: Take Action
Options for remediation:
- Re-geocode with different provider:
- Click Re-geocode button
- Select provider (GOOGLE recommended for low confidence)
- Click Geocode Now
-
New confidence displayed
-
Edit address:
- Click Edit Address
- Correct typos or formatting issues
- Save changes
-
Auto-triggers re-geocoding
-
View on map:
- Click View Map
- Verify location accuracy visually
- Drag marker to correct position if needed
Bulk Re-geocoding¶
Step 1: Select Locations
- In Low Confidence tab, use table checkboxes to select locations
- Or click Select All to select all visible
- Selected count displays: "50 selected"
Step 2: Choose Provider
- Click Bulk Re-geocode button
- Modal opens with provider selection:
┌─────────────────────────────────────┐ │ Bulk Re-geocode │ ├─────────────────────────────────────┤ │ Re-geocode 50 locations │ │ │ │ Provider: [GOOGLE ▼] │ │ │ │ Options: │ │ ☑ Only if confidence < 50 │ │ ☑ Cache results │ │ ☐ Overwrite existing coordinates │ │ │ │ Estimated time: ~2 minutes │ │ │ │ [Cancel] [Start Re-geocoding] │ └─────────────────────────────────────┘
Step 3: Monitor Progress
-
Job starts, progress bar appears:
-
Real-time updates:
- Total processed
- Successful geocodes
- Failed geocodes
- Average new confidence
Step 4: Review Results
Job completion summary:
┌─────────────────────────────────────┐
│ Bulk Re-geocode Complete │
├─────────────────────────────────────┤
│ Processed: 50 │
│ Successful: 47 (94%) │
│ Failed: 3 (6%) │
│ │
│ Quality Improvement: │
│ Avg Confidence Before: 42.5 │
│ Avg Confidence After: 81.3 │
│ Improvement: +38.8 │
│ │
│ [View Failed] [Close] │
└─────────────────────────────────────┘
Handle Duplicates¶
Step 1: View Duplicates Tab
- Click Duplicates tab on dashboard
- Table groups locations by coordinates
Step 2: Review Duplicate Groups
Table displays:
| Coordinates | Count | Addresses | Action |
|---|---|---|---|
| 43.6532, -79.3832 | 3 | 123 Main St, 123 Main Street, 123 Main St Unit 1 | [Review] |
| 43.6540, -79.3825 | 2 | 456 Bay St, 456 Bay Street | [Review] |
Step 3: Resolve Duplicates
Click Review to open resolution modal:
┌─────────────────────────────────────┐
│ Resolve Duplicates │
├─────────────────────────────────────┤
│ 3 locations at 43.6532, -79.3832 │
│ │
│ ○ Merge into single location │
│ Primary: 123 Main St │
│ Merge units from duplicates │
│ │
│ ○ Keep as separate multi-unit │
│ Mark as validated multi-unit │
│ │
│ ○ Re-geocode individually │
│ Try to get unique coordinates │
│ │
│ [Cancel] [Resolve] │
└─────────────────────────────────────┘
Resolution Options:
- Merge: Combine into single Location with multiple Address records
- Multi-unit: Mark as legitimate multi-unit building
- Re-geocode: Attempt to get unique coordinates for each
Quality Improvement Strategies¶
Multi-Provider Geocoding¶
Fallback Chain:
// geocoding.service.ts
const PROVIDER_CHAIN: GeocodeProvider[] = [
'GOOGLE', // Primary: Best accuracy, paid
'MAPBOX', // Fallback 1: Good accuracy, paid
'NOMINATIM', // Fallback 2: Free, decent accuracy
'PHOTON', // Fallback 3: Free, lower accuracy
'ARCGIS' // Fallback 4: Free, basic accuracy
];
async geocode(address: string): Promise<GeocodeResult | null> {
for (const provider of PROVIDER_CHAIN) {
try {
const result = await this.geocodeWithProvider(address, provider);
if (result && result.confidence >= 50) {
return result; // Success, confidence acceptable
}
} catch (error) {
logger.warn(`Geocoding failed with ${provider}:`, error);
// Try next provider
}
}
return null; // All providers failed
}
Benefits: - Increases success rate (90% → 96%+) - Reduces dependency on single provider - Cost optimization (use free providers as fallback) - Provider outage resilience
Address Normalization¶
Pre-Geocoding Normalization:
const normalizeAddressForGeocoding = (address: string): string => {
let normalized = address;
// Remove extra whitespace
normalized = normalized.replace(/\s+/g, ' ').trim();
// Standardize abbreviations
const replacements: Record<string, string> = {
'Street': 'St',
'Avenue': 'Ave',
'Road': 'Rd',
'Drive': 'Dr',
'Boulevard': 'Blvd',
'Apartment': 'Apt',
'Unit': 'Unit',
'Suite': 'Ste'
};
Object.entries(replacements).forEach(([long, short]) => {
const regex = new RegExp(`\\b${long}\\b`, 'gi');
normalized = normalized.replace(regex, short);
});
// Ensure postal code spacing (Canadian format)
normalized = normalized.replace(/([A-Z]\d[A-Z])(\d[A-Z]\d)/, '$1 $2');
// Remove periods from abbreviations
normalized = normalized.replace(/\./g, '');
return normalized;
};
Improvements: - Reduces geocoding errors by 10-15% - Increases confidence scores - Better cache hit rate
Geocoding Cache¶
Redis Cache Implementation:
// geocoding.service.ts
private async geocodeWithCache(address: string): Promise<GeocodeResult | null> {
const cacheKey = `geocode:${normalizeAddress(address)}`;
// Check cache
const cached = await redis.get(cacheKey);
if (cached) {
logger.debug('Geocoding cache hit:', address);
return JSON.parse(cached);
}
// Cache miss, geocode
const result = await this.geocode(address);
if (result) {
// Cache for 30 days
await redis.setex(cacheKey, 2592000, JSON.stringify(result));
}
return result;
}
Benefits: - Reduces API costs (90% cache hit rate) - Faster response times (Redis: <5ms vs API: 200-500ms) - Consistent results for same address - Provider API rate limit avoidance
Manual Verification¶
Critical Location Verification:
Manually verify high-priority locations:
- Campaign offices: Ensure exact coordinates
- Shift start points: Verify accessibility
- Event venues: Confirm entrance location
- Polling stations: Critical for voter info
Verification Process:
// Mark location as manually verified
await prisma.location.update({
where: { id: locationId },
data: {
geocodeConfidence: 100,
geocodeProvider: 'MANUAL',
geocodedAt: new Date()
}
});
Regular Audits¶
Monthly Quality Audit Checklist:
-
Run quality report:
-
Check metrics against thresholds:
- Geocoded % > 95%
- Avg confidence > 70
- Low confidence count < 50
-
Duplicates < 20
-
Review low-confidence locations:
- Filter locations with confidence < 50
- Review top 20 by address
-
Identify patterns (specific streets, providers)
-
Bulk re-geocode low confidence:
- Use GOOGLE provider for accuracy
-
Monitor improvement in avg confidence
-
Resolve duplicates:
- Review all duplicate groups
- Merge or mark as multi-unit
-
Update addresses as needed
-
Export quality report:
Code Examples¶
DataQualityDashboardPage.tsx¶
import React, { useEffect, useState } from 'react';
import { Card, Row, Col, Statistic, Table, Tabs, Button, message } from 'antd';
import { WarningOutlined, CheckCircleOutlined } from '@ant-design/icons';
import { api } from '@/lib/api';
import { Bar } from 'react-chartjs-2';
interface GeocodeStats {
total: number;
geocoded: number;
geocodedPercent: number;
avgConfidence: number;
providerBreakdown: Record<string, number>;
confidenceDistribution: Record<string, number>;
lowConfidenceCount: number;
missingCoordinates: number;
duplicatesCount: number;
}
const DataQualityDashboardPage: React.FC = () => {
const [stats, setStats] = useState<GeocodeStats | null>(null);
const [lowConfLocations, setLowConfLocations] = useState<any[]>([]);
const [duplicates, setDuplicates] = useState<any[]>([]);
const [loading, setLoading] = useState(false);
useEffect(() => {
fetchStats();
fetchLowConfidenceLocations();
fetchDuplicates();
}, []);
const fetchStats = async () => {
setLoading(true);
try {
const { data } = await api.get<GeocodeStats>('/locations/geocode-stats');
setStats(data);
} catch (error) {
message.error('Failed to load statistics');
} finally {
setLoading(false);
}
};
const fetchLowConfidenceLocations = async () => {
try {
const { data } = await api.get('/locations?geocodeConfidence=lt:50&limit=100');
setLowConfLocations(data.data);
} catch (error) {
message.error('Failed to load low-confidence locations');
}
};
const fetchDuplicates = async () => {
try {
const { data } = await api.get('/locations/duplicates');
setDuplicates(data.duplicates);
} catch (error) {
message.error('Failed to load duplicates');
}
};
const handleRegeocodeLocation = async (locationId: number) => {
try {
await api.post(`/locations/${locationId}/regeocode`, { provider: 'GOOGLE' });
message.success('Location re-geocoded successfully');
fetchStats();
fetchLowConfidenceLocations();
} catch (error) {
message.error('Failed to re-geocode location');
}
};
const confidenceChartData = stats ? {
labels: Object.keys(stats.confidenceDistribution),
datasets: [{
label: 'Locations',
data: Object.values(stats.confidenceDistribution),
backgroundColor: [
'#e74c3c', // 0-20: Red
'#f39c12', // 21-40: Orange
'#f1c40f', // 41-60: Yellow
'#3498db', // 61-80: Blue
'#27ae60' // 81-100: Green
]
}]
} : null;
const lowConfColumns = [
{ title: 'Address', dataIndex: 'address', key: 'address' },
{ title: 'Confidence', dataIndex: 'geocodeConfidence', key: 'confidence', render: (val: number) => (
<span style={{ color: val < 30 ? '#e74c3c' : '#f39c12' }}>{val}</span>
)},
{ title: 'Provider', dataIndex: 'geocodeProvider', key: 'provider' },
{ title: 'Action', key: 'action', render: (_: any, record: any) => (
<Button size="small" onClick={() => handleRegeocodeLocation(record.id)}>
Re-geocode
</Button>
)}
];
return (
<div>
<h1>Data Quality Dashboard</h1>
{/* Statistics Cards */}
<Row gutter={16} style={{ marginBottom: 24 }}>
<Col span={6}>
<Card>
<Statistic
title="Total Locations"
value={stats?.total || 0}
prefix={<CheckCircleOutlined />}
/>
</Card>
</Col>
<Col span={6}>
<Card>
<Statistic
title="Geocoded"
value={stats?.geocoded || 0}
suffix={`(${stats?.geocodedPercent.toFixed(1) || 0}%)`}
valueStyle={{ color: (stats?.geocodedPercent || 0) > 95 ? '#27ae60' : '#f39c12' }}
/>
</Card>
</Col>
<Col span={6}>
<Card>
<Statistic
title="Avg Confidence"
value={stats?.avgConfidence.toFixed(1) || 0}
valueStyle={{ color: (stats?.avgConfidence || 0) > 70 ? '#27ae60' : '#f39c12' }}
/>
</Card>
</Col>
<Col span={6}>
<Card>
<Statistic
title="Low Confidence"
value={stats?.lowConfidenceCount || 0}
prefix={<WarningOutlined />}
valueStyle={{ color: (stats?.lowConfidenceCount || 0) > 50 ? '#e74c3c' : '#f39c12' }}
/>
</Card>
</Col>
</Row>
{/* Charts and Tables */}
<Tabs
items={[
{
key: 'overview',
label: 'Overview',
children: (
<div>
<Card title="Confidence Distribution" style={{ marginBottom: 24 }}>
{confidenceChartData && <Bar data={confidenceChartData} />}
</Card>
<Card title="Provider Performance">
<Table
dataSource={stats ? Object.entries(stats.providerBreakdown).map(([provider, count]) => ({
provider,
count
})) : []}
columns={[
{ title: 'Provider', dataIndex: 'provider', key: 'provider' },
{ title: 'Count', dataIndex: 'count', key: 'count' }
]}
pagination={false}
/>
</Card>
</div>
)
},
{
key: 'low-confidence',
label: `Low Confidence (${lowConfLocations.length})`,
children: (
<Table
dataSource={lowConfLocations}
columns={lowConfColumns}
rowKey="id"
loading={loading}
/>
)
},
{
key: 'duplicates',
label: `Duplicates (${duplicates.length})`,
children: (
<Table
dataSource={duplicates}
columns={[
{ title: 'Coordinates', key: 'coords', render: (_, record: any) =>
`${record.coordinates.latitude.toFixed(6)}, ${record.coordinates.longitude.toFixed(6)}`
},
{ title: 'Count', dataIndex: 'count', key: 'count' },
{ title: 'Addresses', key: 'addresses', render: (_, record: any) =>
record.locations.map((l: any) => l.address).join(', ')
}
]}
rowKey={(record) => `${record.coordinates.latitude}-${record.coordinates.longitude}`}
/>
)
}
]}
/>
</div>
);
};
export default DataQualityDashboardPage;
Geocode Statistics Service¶
// locations.service.ts
import { prisma } from '@/config/database';
import type { GeocodeProvider } from '@prisma/client';
export class LocationsService {
async getGeocodeStats() {
const locations = await prisma.location.findMany({
select: {
id: true,
latitude: true,
longitude: true,
geocodeConfidence: true,
geocodeProvider: true
}
});
const total = locations.length;
const geocoded = locations.filter(l => l.latitude && l.longitude).length;
const sumConfidence = locations.reduce((sum, l) => sum + (l.geocodeConfidence || 0), 0);
const avgConfidence = total > 0 ? sumConfidence / total : 0;
// Provider breakdown
const providerBreakdown: Record<string, number> = {};
locations.forEach(l => {
const provider = l.geocodeProvider || 'UNKNOWN';
providerBreakdown[provider] = (providerBreakdown[provider] || 0) + 1;
});
// Confidence distribution
const confidenceDistribution = {
'0-20': 0,
'21-40': 0,
'41-60': 0,
'61-80': 0,
'81-100': 0
};
locations.forEach(l => {
const conf = l.geocodeConfidence || 0;
if (conf <= 20) confidenceDistribution['0-20']++;
else if (conf <= 40) confidenceDistribution['21-40']++;
else if (conf <= 60) confidenceDistribution['41-60']++;
else if (conf <= 80) confidenceDistribution['61-80']++;
else confidenceDistribution['81-100']++;
});
const lowConfidenceCount = locations.filter(l => (l.geocodeConfidence || 0) < 50).length;
const duplicatesCount = await this.countDuplicates();
return {
total,
geocoded,
geocodedPercent: total > 0 ? (geocoded / total) * 100 : 0,
avgConfidence,
providerBreakdown,
confidenceDistribution,
lowConfidenceCount,
missingCoordinates: total - geocoded,
duplicatesCount
};
}
async countDuplicates(): Promise<number> {
const locations = await prisma.location.findMany({
where: {
AND: [
{ latitude: { not: null } },
{ longitude: { not: null } }
]
},
select: { latitude: true, longitude: true }
});
const coordMap = new Map<string, number>();
locations.forEach(l => {
const key = `${l.latitude!.toFixed(6)},${l.longitude!.toFixed(6)}`;
coordMap.set(key, (coordMap.get(key) || 0) + 1);
});
return Array.from(coordMap.values()).filter(count => count > 1).reduce((sum, count) => sum + count, 0);
}
async regeocode(locationId: number, provider?: GeocodeProvider) {
const location = await prisma.location.findUnique({
where: { id: locationId }
});
if (!location) {
throw new Error('Location not found');
}
const result = await geocodingService.geocode(location.address, provider);
if (!result) {
throw new Error('Geocoding failed');
}
return await prisma.location.update({
where: { id: locationId },
data: {
latitude: result.latitude,
longitude: result.longitude,
geocodeConfidence: result.confidence,
geocodeProvider: result.provider,
geocodedAt: new Date()
}
});
}
}
Troubleshooting¶
Problem: Many low-confidence locations¶
Symptoms: - > 100 locations with confidence < 50 - Avg confidence < 60 - Prometheus alert firing
Solutions:
-
Check provider API keys:
-
Try different primary provider:
-
Verify address format:
-
Use postal code for better accuracy:
-
Bulk re-geocode with Google:
Problem: Duplicate locations detected¶
Symptoms: - Multiple locations at same coordinates - Duplicates tab shows many groups - Inflated location counts in cuts
Solutions:
-
Check if legitimately multi-unit:
-
Verify geocoding precision:
-
Review NAR import process:
-
Merge duplicates:
// Merge function const mergeDuplicates = async (primaryId: number, duplicateIds: number[]) => { // Move addresses to primary location await prisma.address.updateMany({ where: { locationId: { in: duplicateIds } }, data: { locationId: primaryId } }); // Delete duplicates await prisma.location.deleteMany({ where: { id: { in: duplicateIds } } }); };
Problem: Geocoding stats slow to load¶
Symptoms: - GET /api/locations/geocode-stats takes > 5 seconds - Dashboard timeout errors - High database CPU
Solutions:
-
Add database indexes:
CREATE INDEX CONCURRENTLY idx_locations_geocode_confidence ON "Location"(geocodeConfidence); CREATE INDEX CONCURRENTLY idx_locations_geocode_provider ON "Location"(geocodeProvider); CREATE INDEX CONCURRENTLY idx_locations_coords ON "Location"(latitude, longitude) WHERE latitude IS NOT NULL AND longitude IS NOT NULL; -
Cache stats in Redis:
-
Use aggregation pipeline:
// Raw SQL for better performance const stats = await prisma.$queryRaw` SELECT COUNT(*) as total, COUNT(latitude) as geocoded, AVG(COALESCE("geocodeConfidence", 0)) as avg_confidence, "geocodeProvider", COUNT(*) FILTER (WHERE "geocodeConfidence" < 50) as low_confidence FROM "Location" GROUP BY "geocodeProvider" `; -
Materialize stats view:
-- Create materialized view CREATE MATERIALIZED VIEW geocode_stats_mv AS SELECT COUNT(*) as total, COUNT(latitude) FILTER (WHERE latitude IS NOT NULL) as geocoded, AVG(COALESCE("geocodeConfidence", 0)) as avg_confidence, COUNT(*) FILTER (WHERE "geocodeConfidence" < 50) as low_confidence FROM "Location"; -- Refresh hourly REFRESH MATERIALIZED VIEW geocode_stats_mv;
Performance Considerations¶
Database Query Optimization¶
Indexes:
- geocodeConfidence (filtering)
- geocodeProvider (grouping)
- (latitude, longitude) composite (duplicate detection)
- Partial index on non-null coordinates
Query Performance: - geocode-stats: ~500ms (1500 locations) - Low confidence filter: ~100ms (with index) - Duplicate detection: ~200ms (coordinate grouping) - Bulk re-geocode: ~2-5 min (150 locations, depends on provider)
API Rate Limits¶
Provider Limits: - Google: 50 QPS, $5/1000 requests - Mapbox: 100,000/month free, then $0.50/1000 - Nominatim: 1 QPS (public), no commercial use - Photon: No official limit, self-hosted recommended - ArcGIS: 100,000/month free
Optimization: - Use Redis cache (30-day TTL) - Batch geocoding jobs (avoid rate limits) - Fallback to free providers for non-critical - Monitor usage via provider dashboards
Caching Strategy¶
Cache Layers:
-
Application Cache (Redis):
-
Statistics Cache:
-
Provider Response Cache:
Cache Hit Rates: - Geocoding: 90%+ (repeated addresses) - Statistics: 95%+ (frequent dashboard views) - Provider responses: 85%+ (re-geocoding attempts)
Related Documentation¶
Backend Documentation¶
- Locations Service:
api/src/modules/map/locations/locations.service.ts - Geocode stats aggregation
- Duplicate detection
-
Re-geocoding operations
-
Geocoding Service:
api/src/modules/map/geocoding/geocoding.service.ts - Multi-provider fallback
- Confidence calculation
-
Cache integration
-
Bulk Geocoding:
api/src/modules/map/locations/bulk-geocode.routes.ts - Job queue integration
- Progress tracking
- Error handling
Frontend Documentation¶
- Data Quality Dashboard:
admin/src/pages/DataQualityDashboardPage.tsx - Statistics display
- Charts and tables
-
Bulk actions
-
Locations Page:
admin/src/pages/LocationsPage.tsx - CSV import/export
- Inline geocoding
- Address editing
Database Documentation¶
- Location Model:
api/prisma/schema.prisma - Geocoding metadata fields
- Indexes for performance
- Relations to Address
Monitoring Documentation¶
- Prometheus Metrics:
api/src/utils/metrics.ts - Custom geocoding metrics
- Quality gauges
-
Alert integration
-
Grafana Dashboard:
configs/grafana/dashboards/data-quality.json - Quality trend charts
- Provider comparison
- Alert visualization
External Resources¶
- Google Geocoding API: https://developers.google.com/maps/documentation/geocoding
- Mapbox Geocoding API: https://docs.mapbox.com/api/search/geocoding
- Nominatim API: https://nominatim.org/release-docs/latest/api/Search
- Photon API: https://photon.komoot.io