50 KiB
Data Quality Dashboard
Overview
The Data Quality Dashboard provides comprehensive monitoring and management of geocoding accuracy and location data integrity. This feature enables campaign administrators to identify and resolve data quality issues, track geocoding provider performance, and ensure reliable map data for canvassing operations.
Key Features:
- Real-time geocoding quality metrics
- Provider success rate tracking
- Low-confidence location detection
- Duplicate location identification
- Bulk re-geocoding operations
- Address validation reporting
- Interactive quality charts
- Export quality reports
Use Cases:
- Monthly data quality audits
- NAR import validation
- Geocoding provider evaluation
- Pre-canvass data verification
- Address database cleanup
- Campaign planning accuracy checks
Architecture Highlights:
- Aggregate statistics via database queries
- Confidence threshold filtering (0-100 scale)
- Provider performance comparison
- Duplicate detection via coordinate matching
- Manual review workflows
- Prometheus metrics integration
Architecture
flowchart TB
subgraph Admin Interface
Admin[Admin User]
Dashboard[DataQualityDashboardPage]
LocationsPage[LocationsPage]
end
subgraph API Layer
StatsAPI["/api/locations/geocode-stats"]
LocationsAPI["/api/locations"]
DuplicatesAPI["/api/locations/duplicates"]
RegeocodeAPI["/api/locations/:id/regeocode"]
BulkGeocodeAPI["/api/locations/bulk-geocode"]
end
subgraph Database
LocationsDB[(Locations)]
Indexes[(Indexes)]
end
subgraph Geocoding Service
GeocodingService[GeocodingService]
Providers[6 Providers]
Cache[Redis Cache]
end
subgraph Monitoring
Prometheus[Prometheus]
Metrics[cm_locations_low_confidence_count]
end
Admin --> Dashboard
Admin --> LocationsPage
Dashboard --> StatsAPI
Dashboard --> LocationsAPI
Dashboard --> DuplicatesAPI
LocationsPage --> RegeocodeAPI
LocationsPage --> BulkGeocodeAPI
StatsAPI --> LocationsDB
LocationsAPI --> LocationsDB
DuplicatesAPI --> LocationsDB
RegeocodeAPI --> GeocodingService
BulkGeocodeAPI --> GeocodingService
LocationsDB --> Indexes
GeocodingService --> Providers
GeocodingService --> Cache
StatsAPI --> Prometheus
Prometheus --> Metrics
Data Flow:
-
Statistics Aggregation:
- Query all locations with geocoding metadata
- Calculate aggregate metrics (total, geocoded %, avg confidence)
- Group by provider for success rate comparison
- Identify low-confidence locations (< 50)
- Detect duplicates via coordinate matching
-
Quality Review:
- Admin views dashboard statistics
- Filters low-confidence locations
- Reviews individual location details
- Identifies patterns (provider failures, address format issues)
-
Remediation:
- Manual address correction
- Single location re-geocoding
- Bulk re-geocoding with different provider
- Duplicate merging or marking
-
Monitoring:
- Prometheus metrics track quality trends
- Alert rules trigger for quality degradation
- Grafana dashboards visualize provider performance
Database Models
Location Model
model Location {
id Int @id @default(autoincrement())
address String
latitude Float?
longitude Float?
postalCode String?
province String?
// Geocoding metadata
geocodeConfidence Int? // 0-100 quality score
geocodeProvider String? // Provider used for geocoding
geocodedAt DateTime? // Timestamp of last geocode
// NAR import fields
locGuid String? @unique
federalDistrict String?
buildingUse Int? // 1 = Residential
addresses Address[]
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@index([geocodeConfidence])
@@index([geocodeProvider])
@@index([latitude, longitude])
@@index([latitude, longitude], where: latitude IS NOT NULL AND longitude IS NOT NULL)
}
Geocode Confidence Scale:
- 0-20: Very Low (manual review required)
- 21-40: Low (likely incorrect, re-geocode recommended)
- 41-60: Medium (acceptable but consider verification)
- 61-80: Good (likely accurate)
- 81-100: Excellent (high confidence)
Geocode Provider Enum:
enum GeocodeProvider {
GOOGLE = 'GOOGLE',
MAPBOX = 'MAPBOX',
NOMINATIM = 'NOMINATIM',
PHOTON = 'PHOTON',
LOCATIONIQ = 'LOCATIONIQ',
ARCGIS = 'ARCGIS',
UNKNOWN = 'UNKNOWN'
}
Address Model
model Address {
id Int @id @default(autoincrement())
locationId Int
location Location @relation(fields: [locationId], references: [id], onDelete: Cascade)
unitNumber String?
firstName String?
lastName String?
supportLevel Int?
notes String?
// Address validation
isValidated Boolean @default(false)
validatedAt DateTime?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@index([locationId])
}
API Endpoints
GET /api/locations/geocode-stats
Fetch aggregate geocoding quality statistics.
Authentication: Required (SUPER_ADMIN, MAP_ADMIN)
Response:
{
"total": 1500,
"geocoded": 1450,
"geocodedPercent": 96.67,
"avgConfidence": 78.5,
"providerBreakdown": {
"GOOGLE": 800,
"MAPBOX": 350,
"NOMINATIM": 200,
"PHOTON": 100,
"ARCGIS": 0,
"LOCATIONIQ": 0,
"UNKNOWN": 50
},
"confidenceDistribution": {
"0-20": 15,
"21-40": 35,
"41-60": 150,
"61-80": 450,
"81-100": 800
},
"lowConfidenceCount": 50,
"missingCoordinates": 50,
"duplicatesCount": 12
}
Implementation:
// locations.service.ts
async getGeocodeStats() {
const locations = await prisma.location.findMany({
select: {
latitude: true,
longitude: true,
geocodeConfidence: true,
geocodeProvider: true
}
});
const total = locations.length;
const geocoded = locations.filter(l => l.latitude && l.longitude).length;
const avgConfidence = locations.reduce((sum, l) =>
sum + (l.geocodeConfidence || 0), 0) / total;
const providerBreakdown = locations.reduce((acc, l) => {
const provider = l.geocodeProvider || 'UNKNOWN';
acc[provider] = (acc[provider] || 0) + 1;
return acc;
}, {} as Record<string, number>);
const confidenceDistribution = {
'0-20': 0,
'21-40': 0,
'41-60': 0,
'61-80': 0,
'81-100': 0
};
locations.forEach(l => {
const conf = l.geocodeConfidence || 0;
if (conf <= 20) confidenceDistribution['0-20']++;
else if (conf <= 40) confidenceDistribution['21-40']++;
else if (conf <= 60) confidenceDistribution['41-60']++;
else if (conf <= 80) confidenceDistribution['61-80']++;
else confidenceDistribution['81-100']++;
});
const lowConfidenceCount = locations.filter(l =>
(l.geocodeConfidence || 0) < 50).length;
return {
total,
geocoded,
geocodedPercent: (geocoded / total) * 100,
avgConfidence,
providerBreakdown,
confidenceDistribution,
lowConfidenceCount,
missingCoordinates: total - geocoded,
duplicatesCount: await this.countDuplicates()
};
}
GET /api/locations?geocodeConfidence=lt:50
Fetch locations filtered by geocode confidence.
Authentication: Required
Query Parameters:
geocodeConfidence(filter):lt:X,gt:X,eq:X,nullgeocodeProvider(filter): Provider name (GOOGLE, MAPBOX, etc.)page(optional): Page number (default: 1)limit(optional): Results per page (default: 50)sortBy(optional): Field to sort by (default: "geocodeConfidence")order(optional): "asc" or "desc" (default: "asc")
Examples:
GET /api/locations?geocodeConfidence=lt:50
GET /api/locations?geocodeConfidence=null
GET /api/locations?geocodeProvider=NOMINATIM&geocodeConfidence=lt:70
GET /api/locations?geocodeConfidence=gt:80&sortBy=address
Response:
{
"data": [
{
"id": 1001,
"address": "123 Main St",
"latitude": 43.6532,
"longitude": -79.3832,
"postalCode": "M5H 2N2",
"geocodeConfidence": 45,
"geocodeProvider": "NOMINATIM",
"geocodedAt": "2025-02-10T10:00:00Z",
"addresses": [...]
}
],
"pagination": {
"page": 1,
"limit": 50,
"total": 150,
"pages": 3
}
}
GET /api/locations/duplicates
Identify locations with identical coordinates.
Authentication: Required (SUPER_ADMIN, MAP_ADMIN)
Query Parameters:
threshold(optional): Distance threshold in meters (default: 1, matches exact duplicates)
Response:
{
"duplicates": [
{
"coordinates": {
"latitude": 43.6532,
"longitude": -79.3832
},
"count": 3,
"locations": [
{
"id": 1001,
"address": "123 Main St",
"postalCode": "M5H 2N2"
},
{
"id": 1002,
"address": "123 Main Street",
"postalCode": "M5H 2N2"
},
{
"id": 1003,
"address": "123 Main St, Unit 1",
"postalCode": "M5H 2N2"
}
]
}
],
"total": 12
}
Implementation:
// locations.service.ts
async findDuplicates(thresholdMeters: number = 1) {
const locations = await prisma.location.findMany({
where: {
AND: [
{ latitude: { not: null } },
{ longitude: { not: null } }
]
},
select: {
id: true,
address: true,
latitude: true,
longitude: true,
postalCode: true
}
});
const coordMap = new Map<string, typeof locations>();
locations.forEach(loc => {
// Round to 6 decimal places (~0.1m precision)
const key = `${loc.latitude!.toFixed(6)},${loc.longitude!.toFixed(6)}`;
if (!coordMap.has(key)) {
coordMap.set(key, []);
}
coordMap.get(key)!.push(loc);
});
const duplicates = Array.from(coordMap.entries())
.filter(([_, locs]) => locs.length > 1)
.map(([coords, locs]) => {
const [lat, lng] = coords.split(',').map(Number);
return {
coordinates: { latitude: lat, longitude: lng },
count: locs.length,
locations: locs
};
});
return {
duplicates,
total: duplicates.reduce((sum, dup) => sum + dup.count, 0)
};
}
POST /api/locations/:id/regeocode
Re-geocode a single location with specified provider.
Authentication: Required (SUPER_ADMIN, MAP_ADMIN)
Request Body:
{
"provider": "GOOGLE",
"address": "123 Main St, Toronto ON M5H 2N2"
}
Parameters:
provider(optional): Specific provider to use (default: fallback chain)address(optional): Override address string (default: use existing)
Response:
{
"id": 1001,
"address": "123 Main St",
"latitude": 43.6532,
"longitude": -79.3832,
"geocodeConfidence": 95,
"geocodeProvider": "GOOGLE",
"geocodedAt": "2025-02-13T10:30:00Z"
}
POST /api/locations/bulk-geocode
Bulk re-geocode multiple locations.
Authentication: Required (SUPER_ADMIN, MAP_ADMIN)
Request Body:
{
"locationIds": [1001, 1002, 1003],
"provider": "GOOGLE",
"confidenceThreshold": 50
}
Parameters:
locationIds(optional): Specific location IDs (default: all with confidence < threshold)provider(optional): Specific provider to use (default: fallback chain)confidenceThreshold(optional): Only re-geocode locations below this confidence (default: 50)
Response:
{
"jobId": "bulk-geocode-20250213-103000",
"status": "queued",
"total": 150,
"message": "Bulk geocoding job started"
}
Job Progress Endpoint:
GET /api/locations/bulk-geocode/:jobId
Job Status Response:
{
"jobId": "bulk-geocode-20250213-103000",
"status": "processing",
"progress": {
"total": 150,
"processed": 75,
"successful": 70,
"failed": 5,
"percent": 50
},
"startedAt": "2025-02-13T10:30:00Z",
"estimatedCompletion": "2025-02-13T10:35:00Z"
}
Configuration
Environment Variables
| Variable | Type | Default | Description |
|---|---|---|---|
| GEOCODE_CONFIDENCE_THRESHOLD | number | 50 | Minimum confidence for acceptable geocoding |
| GEOCODE_PRIMARY_PROVIDER | string | Primary geocoding provider | |
| GEOCODE_FALLBACK_PROVIDERS | string | MAPBOX,NOMINATIM | Comma-separated fallback providers |
| GEOCODE_CACHE_TTL | number | 2592000 | Cache TTL in seconds (30 days) |
Quality Thresholds
| Metric | Warning | Critical | Description |
|---|---|---|---|
| Geocoded % | < 95% | < 90% | Percentage of locations with coordinates |
| Avg Confidence | < 70 | < 60 | Average geocode confidence score |
| Low Confidence Count | > 50 | > 100 | Locations with confidence < 50 |
| Duplicates | > 20 | > 50 | Locations with identical coordinates |
| Missing Coordinates | > 5% | > 10% | Locations without lat/lng |
Prometheus Metrics
Custom Metrics:
// api/src/utils/metrics.ts
export const geocodingQualityGauge = new Gauge({
name: 'cm_geocoding_avg_confidence',
help: 'Average geocoding confidence score (0-100)',
async collect() {
const stats = await locationsService.getGeocodeStats();
this.set(stats.avgConfidence);
}
});
export const lowConfidenceLocationsGauge = new Gauge({
name: 'cm_locations_low_confidence_count',
help: 'Number of locations with geocode confidence < 50',
async collect() {
const stats = await locationsService.getGeocodeStats();
this.set(stats.lowConfidenceCount);
}
});
export const geocodedPercentGauge = new Gauge({
name: 'cm_locations_geocoded_percent',
help: 'Percentage of locations with coordinates',
async collect() {
const stats = await locationsService.getGeocodeStats();
this.set(stats.geocodedPercent);
}
});
export const duplicateLocationsGauge = new Gauge({
name: 'cm_locations_duplicates_count',
help: 'Number of duplicate location entries',
async collect() {
const duplicates = await locationsService.findDuplicates();
this.set(duplicates.total);
}
});
Alert Rules:
# configs/prometheus/alerts.yml
groups:
- name: data_quality
interval: 5m
rules:
- alert: LowGeocodingConfidence
expr: cm_geocoding_avg_confidence < 60
for: 10m
labels:
severity: warning
annotations:
summary: Low average geocoding confidence
description: "Average geocoding confidence is {{ $value }}, below threshold of 60"
- alert: HighLowConfidenceLocations
expr: cm_locations_low_confidence_count > 100
for: 5m
labels:
severity: critical
annotations:
summary: High number of low-confidence locations
description: "{{ $value }} locations have geocoding confidence < 50"
- alert: LowGeocodedPercent
expr: cm_locations_geocoded_percent < 90
for: 10m
labels:
severity: warning
annotations:
summary: Low percentage of geocoded locations
description: "Only {{ $value }}% of locations have coordinates"
- alert: HighDuplicateLocations
expr: cm_locations_duplicates_count > 50
for: 15m
labels:
severity: warning
annotations:
summary: High number of duplicate locations
description: "{{ $value }} duplicate location entries detected"
Quality Metrics
Geocoding Confidence
Calculation:
Geocoding confidence is calculated based on multiple factors:
interface GeocodeResult {
latitude: number;
longitude: number;
matchType: 'exact' | 'interpolated' | 'approximate' | 'fallback';
addressComponents: {
streetNumber?: string;
street?: string;
city?: string;
postalCode?: string;
province?: string;
};
providerConfidence?: number; // Provider-specific score
}
function calculateConfidence(result: GeocodeResult, inputAddress: string): number {
let confidence = 0;
// Match type (0-40 points)
switch (result.matchType) {
case 'exact': confidence += 40; break;
case 'interpolated': confidence += 30; break;
case 'approximate': confidence += 20; break;
case 'fallback': confidence += 10; break;
}
// Address component completeness (0-30 points)
const components = result.addressComponents;
if (components.streetNumber) confidence += 10;
if (components.street) confidence += 10;
if (components.postalCode) confidence += 10;
// Provider-specific confidence (0-30 points)
if (result.providerConfidence) {
confidence += (result.providerConfidence / 100) * 30;
}
return Math.min(Math.round(confidence), 100);
}
Confidence Levels:
- 81-100 (Excellent): Exact match with full address components
- 61-80 (Good): Interpolated match with most components
- 41-60 (Medium): Approximate match, missing some components
- 21-40 (Low): Fallback geocoding, significant uncertainty
- 0-20 (Very Low): Minimal match, likely incorrect
Provider Success Rates
Metrics Tracked:
interface ProviderMetrics {
provider: GeocodeProvider;
totalAttempts: number;
successfulGeocodes: number;
successRate: number; // 0-100%
avgConfidence: number; // 0-100
avgResponseTime: number; // milliseconds
errorCount: number;
lastError?: string;
}
Success Rate Calculation:
const calculateProviderMetrics = async (): Promise<ProviderMetrics[]> => {
const locations = await prisma.location.findMany({
select: {
geocodeProvider: true,
geocodeConfidence: true,
latitude: true,
longitude: true
}
});
const providerGroups = groupBy(locations, 'geocodeProvider');
return Object.entries(providerGroups).map(([provider, locs]) => {
const total = locs.length;
const successful = locs.filter(l => l.latitude && l.longitude).length;
const avgConf = locs.reduce((sum, l) => sum + (l.geocodeConfidence || 0), 0) / total;
return {
provider: provider as GeocodeProvider,
totalAttempts: total,
successfulGeocodes: successful,
successRate: (successful / total) * 100,
avgConfidence: avgConf,
avgResponseTime: 0, // Would need separate tracking
errorCount: total - successful
};
});
};
Duplicate Detection
Detection Methods:
- Exact Coordinate Match:
// Round to 6 decimal places (~0.1m precision)
const isDuplicateExact = (loc1: Location, loc2: Location): boolean => {
return loc1.latitude!.toFixed(6) === loc2.latitude!.toFixed(6) &&
loc1.longitude!.toFixed(6) === loc2.longitude!.toFixed(6);
};
- Proximity Threshold:
// Haversine distance check
const isDuplicateProximity = (loc1: Location, loc2: Location, thresholdM: number): boolean => {
const distance = haversineDistance(
[loc1.latitude!, loc1.longitude!],
[loc2.latitude!, loc2.longitude!]
);
return distance < thresholdM;
};
- Address Similarity:
import { distance as levenshteinDistance } from 'fastest-levenshtein';
const isDuplicateAddress = (addr1: string, addr2: string): boolean => {
const normalized1 = normalizeAddress(addr1);
const normalized2 = normalizeAddress(addr2);
const dist = levenshteinDistance(normalized1, normalized2);
const similarity = 1 - (dist / Math.max(normalized1.length, normalized2.length));
return similarity > 0.9; // 90% similar
};
const normalizeAddress = (address: string): string => {
return address
.toLowerCase()
.replace(/\bstreet\b/g, 'st')
.replace(/\bavenue\b/g, 'ave')
.replace(/\broad\b/g, 'rd')
.replace(/\bdrive\b/g, 'dr')
.replace(/[^a-z0-9]/g, '');
};
Address Validation
Validation Checks:
interface AddressValidationResult {
isValid: boolean;
issues: string[];
suggestions?: string[];
}
const validateAddress = (address: string): AddressValidationResult => {
const issues: string[] = [];
// Check minimum length
if (address.length < 5) {
issues.push('Address too short');
}
// Check for street number
if (!/^\d+/.test(address)) {
issues.push('Missing street number');
}
// Check for street name
if (!/\d+\s+([A-Za-z]+\s*)+/.test(address)) {
issues.push('Missing street name');
}
// Check for postal code (Canadian format)
if (!/[A-Z]\d[A-Z]\s?\d[A-Z]\d/.test(address)) {
issues.push('Missing or invalid postal code');
}
// Check for unusual characters
if (/[^A-Za-z0-9\s,.-]/.test(address)) {
issues.push('Contains unusual characters');
}
return {
isValid: issues.length === 0,
issues
};
};
Admin Workflow
Navigate to Data Quality Dashboard
Step 1: Access Dashboard
- Log in as SUPER_ADMIN or MAP_ADMIN
- Click Map in sidebar
- Click Data Quality submenu
- Dashboard loads with statistics
Step 2: Review Overall Statistics
Dashboard displays 4 main statistic cards:
┌──────────────────┬──────────────────┬──────────────────┬──────────────────┐
│ Total Locations │ Geocoded │ Avg Confidence │ Low Confidence │
│ 1,500 │ 1,450 (96.7%) │ 78.5 │ 50 │
└──────────────────┴──────────────────┴──────────────────┴──────────────────┘
Step 3: Analyze Provider Performance
Provider breakdown table shows:
| Provider | Count | Success Rate | Avg Confidence |
|---|---|---|---|
| 800 | 99.2% | 85.3 | |
| MAPBOX | 350 | 97.1% | 82.1 |
| NOMINATIM | 200 | 94.5% | 75.8 |
| PHOTON | 100 | 91.0% | 68.2 |
| UNKNOWN | 50 | N/A | 0 |
Step 4: Review Confidence Distribution
Bar chart displays confidence distribution:
Confidence Distribution
100 | ┌──────┐
80 | │ │
60 | ┌──────┤ │
40 | ┌──────┤ │ │
20 | │ │ │ │
0 └──┴──────┴──────┴──────┴──────┘
0-20 21-40 41-60 61-80 81-100
15 35 150 450 800
Identify and Review Low-Confidence Locations
Step 1: Filter Low-Confidence Locations
- Click Low Confidence tab on dashboard
- Table loads with locations where confidence < 50
- Sort by confidence (ascending) to prioritize worst
Step 2: Review Location Details
Click row to open detail drawer:
┌─────────────────────────────────────────┐
│ Location Details │
├─────────────────────────────────────────┤
│ Address: 123 Main St │
│ Postal Code: M5H 2N2 │
│ Coordinates: 43.6532, -79.3832 │
│ │
│ Geocoding Info: │
│ Confidence: 45 (Low) │
│ Provider: NOMINATIM │
│ Geocoded: Feb 10, 2025 10:00 AM │
│ │
│ Issues: │
│ • Missing street number in response │
│ • Approximate match only │
│ │
│ [Re-geocode] [Edit Address] [View Map] │
└─────────────────────────────────────────┘
Step 3: Take Action
Options for remediation:
-
Re-geocode with different provider:
- Click Re-geocode button
- Select provider (GOOGLE recommended for low confidence)
- Click Geocode Now
- New confidence displayed
-
Edit address:
- Click Edit Address
- Correct typos or formatting issues
- Save changes
- Auto-triggers re-geocoding
-
View on map:
- Click View Map
- Verify location accuracy visually
- Drag marker to correct position if needed
Bulk Re-geocoding
Step 1: Select Locations
- In Low Confidence tab, use table checkboxes to select locations
- Or click Select All to select all visible
- Selected count displays: "50 selected"
Step 2: Choose Provider
- Click Bulk Re-geocode button
- Modal opens with provider selection:
┌─────────────────────────────────────┐ │ Bulk Re-geocode │ ├─────────────────────────────────────┤ │ Re-geocode 50 locations │ │ │ │ Provider: [GOOGLE ▼] │ │ │ │ Options: │ │ ☑ Only if confidence < 50 │ │ ☑ Cache results │ │ ☐ Overwrite existing coordinates │ │ │ │ Estimated time: ~2 minutes │ │ │ │ [Cancel] [Start Re-geocoding] │ └─────────────────────────────────────┘
Step 3: Monitor Progress
-
Job starts, progress bar appears:
Re-geocoding in progress... 25/50 (50%) [████████████░░░░░░░░░░░░] 50% -
Real-time updates:
- Total processed
- Successful geocodes
- Failed geocodes
- Average new confidence
Step 4: Review Results
Job completion summary:
┌─────────────────────────────────────┐
│ Bulk Re-geocode Complete │
├─────────────────────────────────────┤
│ Processed: 50 │
│ Successful: 47 (94%) │
│ Failed: 3 (6%) │
│ │
│ Quality Improvement: │
│ Avg Confidence Before: 42.5 │
│ Avg Confidence After: 81.3 │
│ Improvement: +38.8 │
│ │
│ [View Failed] [Close] │
└─────────────────────────────────────┘
Handle Duplicates
Step 1: View Duplicates Tab
- Click Duplicates tab on dashboard
- Table groups locations by coordinates
Step 2: Review Duplicate Groups
Table displays:
| Coordinates | Count | Addresses | Action |
|---|---|---|---|
| 43.6532, -79.3832 | 3 | 123 Main St, 123 Main Street, 123 Main St Unit 1 | [Review] |
| 43.6540, -79.3825 | 2 | 456 Bay St, 456 Bay Street | [Review] |
Step 3: Resolve Duplicates
Click Review to open resolution modal:
┌─────────────────────────────────────┐
│ Resolve Duplicates │
├─────────────────────────────────────┤
│ 3 locations at 43.6532, -79.3832 │
│ │
│ ○ Merge into single location │
│ Primary: 123 Main St │
│ Merge units from duplicates │
│ │
│ ○ Keep as separate multi-unit │
│ Mark as validated multi-unit │
│ │
│ ○ Re-geocode individually │
│ Try to get unique coordinates │
│ │
│ [Cancel] [Resolve] │
└─────────────────────────────────────┘
Resolution Options:
- Merge: Combine into single Location with multiple Address records
- Multi-unit: Mark as legitimate multi-unit building
- Re-geocode: Attempt to get unique coordinates for each
Quality Improvement Strategies
Multi-Provider Geocoding
Fallback Chain:
// geocoding.service.ts
const PROVIDER_CHAIN: GeocodeProvider[] = [
'GOOGLE', // Primary: Best accuracy, paid
'MAPBOX', // Fallback 1: Good accuracy, paid
'NOMINATIM', // Fallback 2: Free, decent accuracy
'PHOTON', // Fallback 3: Free, lower accuracy
'ARCGIS' // Fallback 4: Free, basic accuracy
];
async geocode(address: string): Promise<GeocodeResult | null> {
for (const provider of PROVIDER_CHAIN) {
try {
const result = await this.geocodeWithProvider(address, provider);
if (result && result.confidence >= 50) {
return result; // Success, confidence acceptable
}
} catch (error) {
logger.warn(`Geocoding failed with ${provider}:`, error);
// Try next provider
}
}
return null; // All providers failed
}
Benefits:
- Increases success rate (90% → 96%+)
- Reduces dependency on single provider
- Cost optimization (use free providers as fallback)
- Provider outage resilience
Address Normalization
Pre-Geocoding Normalization:
const normalizeAddressForGeocoding = (address: string): string => {
let normalized = address;
// Remove extra whitespace
normalized = normalized.replace(/\s+/g, ' ').trim();
// Standardize abbreviations
const replacements: Record<string, string> = {
'Street': 'St',
'Avenue': 'Ave',
'Road': 'Rd',
'Drive': 'Dr',
'Boulevard': 'Blvd',
'Apartment': 'Apt',
'Unit': 'Unit',
'Suite': 'Ste'
};
Object.entries(replacements).forEach(([long, short]) => {
const regex = new RegExp(`\\b${long}\\b`, 'gi');
normalized = normalized.replace(regex, short);
});
// Ensure postal code spacing (Canadian format)
normalized = normalized.replace(/([A-Z]\d[A-Z])(\d[A-Z]\d)/, '$1 $2');
// Remove periods from abbreviations
normalized = normalized.replace(/\./g, '');
return normalized;
};
Improvements:
- Reduces geocoding errors by 10-15%
- Increases confidence scores
- Better cache hit rate
Geocoding Cache
Redis Cache Implementation:
// geocoding.service.ts
private async geocodeWithCache(address: string): Promise<GeocodeResult | null> {
const cacheKey = `geocode:${normalizeAddress(address)}`;
// Check cache
const cached = await redis.get(cacheKey);
if (cached) {
logger.debug('Geocoding cache hit:', address);
return JSON.parse(cached);
}
// Cache miss, geocode
const result = await this.geocode(address);
if (result) {
// Cache for 30 days
await redis.setex(cacheKey, 2592000, JSON.stringify(result));
}
return result;
}
Benefits:
- Reduces API costs (90% cache hit rate)
- Faster response times (Redis: <5ms vs API: 200-500ms)
- Consistent results for same address
- Provider API rate limit avoidance
Manual Verification
Critical Location Verification:
Manually verify high-priority locations:
- Campaign offices: Ensure exact coordinates
- Shift start points: Verify accessibility
- Event venues: Confirm entrance location
- Polling stations: Critical for voter info
Verification Process:
// Mark location as manually verified
await prisma.location.update({
where: { id: locationId },
data: {
geocodeConfidence: 100,
geocodeProvider: 'MANUAL',
geocodedAt: new Date()
}
});
Regular Audits
Monthly Quality Audit Checklist:
-
Run quality report:
curl http://localhost:4000/api/locations/geocode-stats -
Check metrics against thresholds:
- Geocoded % > 95%
- Avg confidence > 70
- Low confidence count < 50
- Duplicates < 20
-
Review low-confidence locations:
- Filter locations with confidence < 50
- Review top 20 by address
- Identify patterns (specific streets, providers)
-
Bulk re-geocode low confidence:
- Use GOOGLE provider for accuracy
- Monitor improvement in avg confidence
-
Resolve duplicates:
- Review all duplicate groups
- Merge or mark as multi-unit
- Update addresses as needed
-
Export quality report:
const report = await generateQualityReport(); fs.writeFileSync(`quality-report-${date}.json`, JSON.stringify(report, null, 2));
Code Examples
DataQualityDashboardPage.tsx
import React, { useEffect, useState } from 'react';
import { Card, Row, Col, Statistic, Table, Tabs, Button, message } from 'antd';
import { WarningOutlined, CheckCircleOutlined } from '@ant-design/icons';
import { api } from '@/lib/api';
import { Bar } from 'react-chartjs-2';
interface GeocodeStats {
total: number;
geocoded: number;
geocodedPercent: number;
avgConfidence: number;
providerBreakdown: Record<string, number>;
confidenceDistribution: Record<string, number>;
lowConfidenceCount: number;
missingCoordinates: number;
duplicatesCount: number;
}
const DataQualityDashboardPage: React.FC = () => {
const [stats, setStats] = useState<GeocodeStats | null>(null);
const [lowConfLocations, setLowConfLocations] = useState<any[]>([]);
const [duplicates, setDuplicates] = useState<any[]>([]);
const [loading, setLoading] = useState(false);
useEffect(() => {
fetchStats();
fetchLowConfidenceLocations();
fetchDuplicates();
}, []);
const fetchStats = async () => {
setLoading(true);
try {
const { data } = await api.get<GeocodeStats>('/locations/geocode-stats');
setStats(data);
} catch (error) {
message.error('Failed to load statistics');
} finally {
setLoading(false);
}
};
const fetchLowConfidenceLocations = async () => {
try {
const { data } = await api.get('/locations?geocodeConfidence=lt:50&limit=100');
setLowConfLocations(data.data);
} catch (error) {
message.error('Failed to load low-confidence locations');
}
};
const fetchDuplicates = async () => {
try {
const { data } = await api.get('/locations/duplicates');
setDuplicates(data.duplicates);
} catch (error) {
message.error('Failed to load duplicates');
}
};
const handleRegeocodeLocation = async (locationId: number) => {
try {
await api.post(`/locations/${locationId}/regeocode`, { provider: 'GOOGLE' });
message.success('Location re-geocoded successfully');
fetchStats();
fetchLowConfidenceLocations();
} catch (error) {
message.error('Failed to re-geocode location');
}
};
const confidenceChartData = stats ? {
labels: Object.keys(stats.confidenceDistribution),
datasets: [{
label: 'Locations',
data: Object.values(stats.confidenceDistribution),
backgroundColor: [
'#e74c3c', // 0-20: Red
'#f39c12', // 21-40: Orange
'#f1c40f', // 41-60: Yellow
'#3498db', // 61-80: Blue
'#27ae60' // 81-100: Green
]
}]
} : null;
const lowConfColumns = [
{ title: 'Address', dataIndex: 'address', key: 'address' },
{ title: 'Confidence', dataIndex: 'geocodeConfidence', key: 'confidence', render: (val: number) => (
<span style={{ color: val < 30 ? '#e74c3c' : '#f39c12' }}>{val}</span>
)},
{ title: 'Provider', dataIndex: 'geocodeProvider', key: 'provider' },
{ title: 'Action', key: 'action', render: (_: any, record: any) => (
<Button size="small" onClick={() => handleRegeocodeLocation(record.id)}>
Re-geocode
</Button>
)}
];
return (
<div>
<h1>Data Quality Dashboard</h1>
{/* Statistics Cards */}
<Row gutter={16} style={{ marginBottom: 24 }}>
<Col span={6}>
<Card>
<Statistic
title="Total Locations"
value={stats?.total || 0}
prefix={<CheckCircleOutlined />}
/>
</Card>
</Col>
<Col span={6}>
<Card>
<Statistic
title="Geocoded"
value={stats?.geocoded || 0}
suffix={`(${stats?.geocodedPercent.toFixed(1) || 0}%)`}
valueStyle={{ color: (stats?.geocodedPercent || 0) > 95 ? '#27ae60' : '#f39c12' }}
/>
</Card>
</Col>
<Col span={6}>
<Card>
<Statistic
title="Avg Confidence"
value={stats?.avgConfidence.toFixed(1) || 0}
valueStyle={{ color: (stats?.avgConfidence || 0) > 70 ? '#27ae60' : '#f39c12' }}
/>
</Card>
</Col>
<Col span={6}>
<Card>
<Statistic
title="Low Confidence"
value={stats?.lowConfidenceCount || 0}
prefix={<WarningOutlined />}
valueStyle={{ color: (stats?.lowConfidenceCount || 0) > 50 ? '#e74c3c' : '#f39c12' }}
/>
</Card>
</Col>
</Row>
{/* Charts and Tables */}
<Tabs
items={[
{
key: 'overview',
label: 'Overview',
children: (
<div>
<Card title="Confidence Distribution" style={{ marginBottom: 24 }}>
{confidenceChartData && <Bar data={confidenceChartData} />}
</Card>
<Card title="Provider Performance">
<Table
dataSource={stats ? Object.entries(stats.providerBreakdown).map(([provider, count]) => ({
provider,
count
})) : []}
columns={[
{ title: 'Provider', dataIndex: 'provider', key: 'provider' },
{ title: 'Count', dataIndex: 'count', key: 'count' }
]}
pagination={false}
/>
</Card>
</div>
)
},
{
key: 'low-confidence',
label: `Low Confidence (${lowConfLocations.length})`,
children: (
<Table
dataSource={lowConfLocations}
columns={lowConfColumns}
rowKey="id"
loading={loading}
/>
)
},
{
key: 'duplicates',
label: `Duplicates (${duplicates.length})`,
children: (
<Table
dataSource={duplicates}
columns={[
{ title: 'Coordinates', key: 'coords', render: (_, record: any) =>
`${record.coordinates.latitude.toFixed(6)}, ${record.coordinates.longitude.toFixed(6)}`
},
{ title: 'Count', dataIndex: 'count', key: 'count' },
{ title: 'Addresses', key: 'addresses', render: (_, record: any) =>
record.locations.map((l: any) => l.address).join(', ')
}
]}
rowKey={(record) => `${record.coordinates.latitude}-${record.coordinates.longitude}`}
/>
)
}
]}
/>
</div>
);
};
export default DataQualityDashboardPage;
Geocode Statistics Service
// locations.service.ts
import { prisma } from '@/config/database';
import type { GeocodeProvider } from '@prisma/client';
export class LocationsService {
async getGeocodeStats() {
const locations = await prisma.location.findMany({
select: {
id: true,
latitude: true,
longitude: true,
geocodeConfidence: true,
geocodeProvider: true
}
});
const total = locations.length;
const geocoded = locations.filter(l => l.latitude && l.longitude).length;
const sumConfidence = locations.reduce((sum, l) => sum + (l.geocodeConfidence || 0), 0);
const avgConfidence = total > 0 ? sumConfidence / total : 0;
// Provider breakdown
const providerBreakdown: Record<string, number> = {};
locations.forEach(l => {
const provider = l.geocodeProvider || 'UNKNOWN';
providerBreakdown[provider] = (providerBreakdown[provider] || 0) + 1;
});
// Confidence distribution
const confidenceDistribution = {
'0-20': 0,
'21-40': 0,
'41-60': 0,
'61-80': 0,
'81-100': 0
};
locations.forEach(l => {
const conf = l.geocodeConfidence || 0;
if (conf <= 20) confidenceDistribution['0-20']++;
else if (conf <= 40) confidenceDistribution['21-40']++;
else if (conf <= 60) confidenceDistribution['41-60']++;
else if (conf <= 80) confidenceDistribution['61-80']++;
else confidenceDistribution['81-100']++;
});
const lowConfidenceCount = locations.filter(l => (l.geocodeConfidence || 0) < 50).length;
const duplicatesCount = await this.countDuplicates();
return {
total,
geocoded,
geocodedPercent: total > 0 ? (geocoded / total) * 100 : 0,
avgConfidence,
providerBreakdown,
confidenceDistribution,
lowConfidenceCount,
missingCoordinates: total - geocoded,
duplicatesCount
};
}
async countDuplicates(): Promise<number> {
const locations = await prisma.location.findMany({
where: {
AND: [
{ latitude: { not: null } },
{ longitude: { not: null } }
]
},
select: { latitude: true, longitude: true }
});
const coordMap = new Map<string, number>();
locations.forEach(l => {
const key = `${l.latitude!.toFixed(6)},${l.longitude!.toFixed(6)}`;
coordMap.set(key, (coordMap.get(key) || 0) + 1);
});
return Array.from(coordMap.values()).filter(count => count > 1).reduce((sum, count) => sum + count, 0);
}
async regeocode(locationId: number, provider?: GeocodeProvider) {
const location = await prisma.location.findUnique({
where: { id: locationId }
});
if (!location) {
throw new Error('Location not found');
}
const result = await geocodingService.geocode(location.address, provider);
if (!result) {
throw new Error('Geocoding failed');
}
return await prisma.location.update({
where: { id: locationId },
data: {
latitude: result.latitude,
longitude: result.longitude,
geocodeConfidence: result.confidence,
geocodeProvider: result.provider,
geocodedAt: new Date()
}
});
}
}
Troubleshooting
Problem: Many low-confidence locations
Symptoms:
-
100 locations with confidence < 50
- Avg confidence < 60
- Prometheus alert firing
Solutions:
- Check provider API keys:
# Test Google Geocoding API
curl "https://maps.googleapis.com/maps/api/geocode/json?address=123+Main+St+Toronto&key=YOUR_KEY"
# Verify key in .env
echo $GEOCODE_GOOGLE_API_KEY
- Try different primary provider:
# In .env, change primary provider
GEOCODE_PRIMARY_PROVIDER=GOOGLE # Most accurate
# Or try:
GEOCODE_PRIMARY_PROVIDER=MAPBOX # Good alternative
- Verify address format:
// Bad: Missing city/postal
"123 Main St"
// Good: Full address
"123 Main St, Toronto ON M5H 2N2"
- Use postal code for better accuracy:
// Append postal code if available
const fullAddress = location.postalCode
? `${location.address}, ${location.postalCode}`
: location.address;
- Bulk re-geocode with Google:
# Via API
curl -X POST http://localhost:4000/api/locations/bulk-geocode \
-H "Authorization: Bearer $TOKEN" \
-d '{"provider":"GOOGLE","confidenceThreshold":50}'
Problem: Duplicate locations detected
Symptoms:
- Multiple locations at same coordinates
- Duplicates tab shows many groups
- Inflated location counts in cuts
Solutions:
- Check if legitimately multi-unit:
-- Find buildings with multiple addresses
SELECT l.id, l.address, COUNT(a.id) as unit_count
FROM "Location" l
JOIN "Address" a ON a."locationId" = l.id
GROUP BY l.id
HAVING COUNT(a.id) > 1;
- Verify geocoding precision:
// Check if rounding issue
const isDuplicateRounding = (loc1, loc2) => {
// Use 4 decimal places (~11m precision) instead of 6 (~0.1m)
return loc1.latitude.toFixed(4) === loc2.latitude.toFixed(4) &&
loc1.longitude.toFixed(4) === loc2.longitude.toFixed(4);
};
- Review NAR import process:
// Ensure LOC_GUID unique constraint
const location = await prisma.location.upsert({
where: { locGuid: narRecord.LOC_GUID },
update: { /* update fields */ },
create: { /* create fields */ }
});
- Merge duplicates:
// Merge function
const mergeDuplicates = async (primaryId: number, duplicateIds: number[]) => {
// Move addresses to primary location
await prisma.address.updateMany({
where: { locationId: { in: duplicateIds } },
data: { locationId: primaryId }
});
// Delete duplicates
await prisma.location.deleteMany({
where: { id: { in: duplicateIds } }
});
};
Problem: Geocoding stats slow to load
Symptoms:
- GET /api/locations/geocode-stats takes > 5 seconds
- Dashboard timeout errors
- High database CPU
Solutions:
- Add database indexes:
CREATE INDEX CONCURRENTLY idx_locations_geocode_confidence
ON "Location"(geocodeConfidence);
CREATE INDEX CONCURRENTLY idx_locations_geocode_provider
ON "Location"(geocodeProvider);
CREATE INDEX CONCURRENTLY idx_locations_coords
ON "Location"(latitude, longitude)
WHERE latitude IS NOT NULL AND longitude IS NOT NULL;
- Cache stats in Redis:
// Cache for 5 minutes
const getCachedStats = async () => {
const cached = await redis.get('geocode:stats');
if (cached) return JSON.parse(cached);
const stats = await locationsService.getGeocodeStats();
await redis.setex('geocode:stats', 300, JSON.stringify(stats));
return stats;
};
- Use aggregation pipeline:
// Raw SQL for better performance
const stats = await prisma.$queryRaw`
SELECT
COUNT(*) as total,
COUNT(latitude) as geocoded,
AVG(COALESCE("geocodeConfidence", 0)) as avg_confidence,
"geocodeProvider",
COUNT(*) FILTER (WHERE "geocodeConfidence" < 50) as low_confidence
FROM "Location"
GROUP BY "geocodeProvider"
`;
- Materialize stats view:
-- Create materialized view
CREATE MATERIALIZED VIEW geocode_stats_mv AS
SELECT
COUNT(*) as total,
COUNT(latitude) FILTER (WHERE latitude IS NOT NULL) as geocoded,
AVG(COALESCE("geocodeConfidence", 0)) as avg_confidence,
COUNT(*) FILTER (WHERE "geocodeConfidence" < 50) as low_confidence
FROM "Location";
-- Refresh hourly
REFRESH MATERIALIZED VIEW geocode_stats_mv;
Performance Considerations
Database Query Optimization
Indexes:
geocodeConfidence(filtering)geocodeProvider(grouping)(latitude, longitude)composite (duplicate detection)- Partial index on non-null coordinates
Query Performance:
- geocode-stats: ~500ms (1500 locations)
- Low confidence filter: ~100ms (with index)
- Duplicate detection: ~200ms (coordinate grouping)
- Bulk re-geocode: ~2-5 min (150 locations, depends on provider)
API Rate Limits
Provider Limits:
- Google: 50 QPS, $5/1000 requests
- Mapbox: 100,000/month free, then $0.50/1000
- Nominatim: 1 QPS (public), no commercial use
- Photon: No official limit, self-hosted recommended
- ArcGIS: 100,000/month free
Optimization:
- Use Redis cache (30-day TTL)
- Batch geocoding jobs (avoid rate limits)
- Fallback to free providers for non-critical
- Monitor usage via provider dashboards
Caching Strategy
Cache Layers:
- Application Cache (Redis):
// 30-day TTL for geocode results
const cacheKey = `geocode:${normalizeAddress(address)}`;
await redis.setex(cacheKey, 2592000, JSON.stringify(result));
- Statistics Cache:
// 5-minute TTL for stats
await redis.setex('geocode:stats', 300, JSON.stringify(stats));
- Provider Response Cache:
// Cache raw provider responses separately
await redis.setex(`provider:${provider}:${address}`, 604800, JSON.stringify(rawResponse));
Cache Hit Rates:
- Geocoding: 90%+ (repeated addresses)
- Statistics: 95%+ (frequent dashboard views)
- Provider responses: 85%+ (re-geocoding attempts)
Related Documentation
Backend Documentation
-
Locations Service:
api/src/modules/map/locations/locations.service.ts- Geocode stats aggregation
- Duplicate detection
- Re-geocoding operations
-
Geocoding Service:
api/src/modules/map/geocoding/geocoding.service.ts- Multi-provider fallback
- Confidence calculation
- Cache integration
-
Bulk Geocoding:
api/src/modules/map/locations/bulk-geocode.routes.ts- Job queue integration
- Progress tracking
- Error handling
Frontend Documentation
-
Data Quality Dashboard:
admin/src/pages/DataQualityDashboardPage.tsx- Statistics display
- Charts and tables
- Bulk actions
-
Locations Page:
admin/src/pages/LocationsPage.tsx- CSV import/export
- Inline geocoding
- Address editing
Database Documentation
- Location Model:
api/prisma/schema.prisma- Geocoding metadata fields
- Indexes for performance
- Relations to Address
Monitoring Documentation
-
Prometheus Metrics:
api/src/utils/metrics.ts- Custom geocoding metrics
- Quality gauges
- Alert integration
-
Grafana Dashboard:
configs/grafana/dashboards/data-quality.json- Quality trend charts
- Provider comparison
- Alert visualization
External Resources
- Google Geocoding API: https://developers.google.com/maps/documentation/geocoding
- Mapbox Geocoding API: https://docs.mapbox.com/api/search/geocoding
- Nominatim API: https://nominatim.org/release-docs/latest/api/Search
- Photon API: https://photon.komoot.io