1030 lines
27 KiB
Markdown
1030 lines
27 KiB
Markdown
# Multi-Provider Geocoding Service
|
|
|
|
## Overview
|
|
|
|
The geocoding service provides automated address-to-coordinate conversion using a six-provider fallback chain. It enables campaigns to quickly convert voter addresses to map coordinates, with confidence scoring, Redis caching, and BullMQ queue integration for bulk operations.
|
|
|
|
**Key Capabilities:**
|
|
|
|
- **6 Geocoding Providers**: Google, Mapbox, Nominatim, Photon, LocationIQ, ArcGIS
|
|
- **Provider Fallback Chain**: Try providers in order until success
|
|
- **Confidence Scoring**: 0-100 score based on match quality
|
|
- **Redis Caching**: 7-day TTL to avoid redundant API calls
|
|
- **Bulk Queue Processing**: BullMQ integration for large geocoding jobs
|
|
- **Address Normalization**: Expand abbreviations, normalize postal codes
|
|
- **Reverse Geocoding**: Convert coordinates to human-readable address
|
|
- **Provider Health Tracking**: Prometheus metrics for success rates
|
|
|
|
**Use Cases:**
|
|
|
|
- Bulk geocoding of voter files
|
|
- Real-time address validation during data entry
|
|
- Map marker placement for locations
|
|
- Address autocomplete (future)
|
|
- Spatial filtering by coordinates
|
|
- Walk sheet generation with accurate maps
|
|
|
|
## Architecture
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[Location Service] -->|Geocode Request| B[Geocoding Service]
|
|
B -->|Check Cache| C[(Redis Cache)]
|
|
C -->|Cache Hit| A
|
|
C -->|Cache Miss| D[Provider Chain]
|
|
|
|
D -->|Try Provider 1| E[Google Geocoding API]
|
|
E -->|Success| F[Confidence Scorer]
|
|
E -->|Fail| G[Try Provider 2]
|
|
G -->|Mapbox| H[Mapbox Geocoding API]
|
|
H -->|Success| F
|
|
H -->|Fail| I[Try Provider 3]
|
|
I -->|Nominatim| J[Nominatim API]
|
|
J -->|Success| F
|
|
J -->|Fail| K[Try Provider 4]
|
|
K -->|Photon| L[Photon API]
|
|
L -->|Success| F
|
|
L -->|Fail| M[Try Provider 5]
|
|
M -->|LocationIQ| N[LocationIQ API]
|
|
N -->|Success| F
|
|
N -->|Fail| O[Try Provider 6]
|
|
O -->|ArcGIS| P[ArcGIS API]
|
|
P -->|Success| F
|
|
P -->|Fail| Q[Geocoding Failed]
|
|
|
|
F -->|Store Result| C
|
|
F -->|Return| A
|
|
|
|
R[Bulk Geocode Job] -->|Queue| S[(BullMQ)]
|
|
S -->|Process Batch| B
|
|
B -->|Rate Limit| T[Rate Limiter]
|
|
T -->|Allow| D
|
|
|
|
style C fill:#fff4e1
|
|
style S fill:#fff4e1
|
|
style E fill:#e8f5e9
|
|
style H fill:#e8f5e9
|
|
style J fill:#e8f5e9
|
|
style L fill:#e8f5e9
|
|
style N fill:#e8f5e9
|
|
style P fill:#e8f5e9
|
|
```
|
|
|
|
**Flow Description:**
|
|
|
|
1. **Location service requests geocode** → Geocoding service checks Redis cache
|
|
2. **Cache miss** → Try providers in configured order (Google → Mapbox → Nominatim → Photon → LocationIQ → ArcGIS)
|
|
3. **Provider success** → Calculate confidence score (0-100) based on match type
|
|
4. **Cache result** → Store in Redis with 7-day TTL
|
|
5. **Bulk geocoding** → BullMQ worker processes batches with rate limiting
|
|
6. **Metrics tracking** → Prometheus gauges for provider health and cache hit rate
|
|
|
|
## Database Models
|
|
|
|
### GeocodeProvider Enum
|
|
|
|
See [Location Model Documentation](../../database/models/map.md#location-model) for full schema.
|
|
|
|
**Provider Enum Values:**
|
|
|
|
```typescript
|
|
enum GeocodeProvider {
|
|
GOOGLE
|
|
MAPBOX
|
|
NOMINATIM
|
|
PHOTON
|
|
LOCATIONIQ
|
|
ARCGIS
|
|
UNKNOWN
|
|
}
|
|
```
|
|
|
|
**Location Model Geocoding Fields:**
|
|
|
|
- `latitude` / `longitude`: Decimal coordinates from geocoding
|
|
- `geocodeConfidence`: Integer 0-100 (>90=high, 70-90=medium, <70=low)
|
|
- `geocodeProvider`: Which provider successfully geocoded
|
|
- `geocodeAttempts`: Number of failed attempts (for retry logic)
|
|
- `lastGeocodeAttempt`: Timestamp of last geocoding attempt
|
|
|
|
**Related Models:**
|
|
|
|
- [Location](../../database/models/map.md#location-model) — Stores geocoded coordinates
|
|
- [LocationHistory](../../database/models/map.md#locationhistory-model) — Audit trail for geocoding changes
|
|
|
|
## API Endpoints
|
|
|
|
See [Geocoding Backend Module Documentation](../../backend/modules/map/geocoding.md) for full API reference.
|
|
|
|
**Geocoding Endpoints:**
|
|
|
|
| Method | Endpoint | Auth | Description |
|
|
|--------|----------|------|-------------|
|
|
| POST | `/api/map/locations/geocode` | MAP_ADMIN | Geocode single address |
|
|
| POST | `/api/map/locations/reverse-geocode` | MAP_ADMIN | Reverse geocode lat/lng to address |
|
|
| POST | `/api/map/locations/bulk-geocode/start` | MAP_ADMIN | Start bulk geocoding job (BullMQ) |
|
|
| GET | `/api/map/locations/bulk-geocode/status` | MAP_ADMIN | Check bulk geocoding job status |
|
|
| POST | `/api/map/locations/bulk-geocode/cancel` | MAP_ADMIN | Cancel running bulk geocoding job |
|
|
|
|
**Request/Response Examples:**
|
|
|
|
**Single Geocode Request:**
|
|
|
|
```json
|
|
POST /api/map/locations/geocode
|
|
{
|
|
"address": "123 Main Street, Ottawa, ON K1A 0B1"
|
|
}
|
|
|
|
// Response
|
|
{
|
|
"latitude": 45.4215,
|
|
"longitude": -75.6972,
|
|
"confidence": 95,
|
|
"provider": "GOOGLE",
|
|
"formattedAddress": "123 Main St, Ottawa, ON K1A 0B1, Canada"
|
|
}
|
|
```
|
|
|
|
**Bulk Geocode Job:**
|
|
|
|
```json
|
|
POST /api/map/locations/bulk-geocode/start
|
|
{
|
|
"confidenceThreshold": 70,
|
|
"provider": "GOOGLE",
|
|
"batchSize": 50
|
|
}
|
|
|
|
// Response
|
|
{
|
|
"jobId": "bulk-geocode-uuid",
|
|
"status": "queued",
|
|
"totalLocations": 1234
|
|
}
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Type | Default | Description |
|
|
|----------|------|---------|-------------|
|
|
| `GEOCODING_ENABLED` | boolean | `true` | Enable geocoding services |
|
|
| `GEOCODING_CACHE_ENABLED` | boolean | `true` | Cache results in Redis |
|
|
| `GEOCODING_CACHE_TTL_HOURS` | number | `168` | Cache TTL (7 days) |
|
|
| `GEOCODING_PROVIDERS` | string | `GOOGLE,MAPBOX,NOMINATIM,PHOTON,LOCATIONIQ,ARCGIS` | Provider order (comma-separated) |
|
|
| `GOOGLE_MAPS_API_KEY` | string | - | Google Geocoding API key (required if Google enabled) |
|
|
| `MAPBOX_ACCESS_TOKEN` | string | - | Mapbox API token (required if Mapbox enabled) |
|
|
| `LOCATIONIQ_API_KEY` | string | - | LocationIQ API key (required if LocationIQ enabled) |
|
|
| `NOMINATIM_BASE_URL` | string | `https://nominatim.openstreetmap.org` | Nominatim API URL |
|
|
| `PHOTON_BASE_URL` | string | `https://photon.komoot.io` | Photon API URL |
|
|
| `ARCGIS_BASE_URL` | string | `https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer` | ArcGIS API URL |
|
|
|
|
### Provider Configuration
|
|
|
|
**Provider Selection Strategy:**
|
|
|
|
1. **Free tier exhausted?** Remove provider from chain
|
|
2. **Rate limit hit?** Skip provider temporarily (5min cooldown)
|
|
3. **Service down?** Skip provider (exponential backoff)
|
|
4. **Low confidence?** Try next provider
|
|
|
|
**Provider Priority (Default):**
|
|
|
|
1. **Google** — Best accuracy, paid API (free $200/month credit)
|
|
2. **Mapbox** — Good accuracy, generous free tier (100k/month)
|
|
3. **Nominatim** — Free, moderate accuracy, 1 req/sec limit
|
|
4. **Photon** — Free, fast, good for European addresses
|
|
5. **LocationIQ** — Free tier (5k/day), good international coverage
|
|
6. **ArcGIS** — Free tier (20k/month), good US coverage
|
|
|
|
### Confidence Scoring Rules
|
|
|
|
**Confidence Score Calculation:**
|
|
|
|
| Match Type | Google | Mapbox | Nominatim | Photon | LocationIQ | ArcGIS |
|
|
|------------|--------|--------|-----------|--------|------------|--------|
|
|
| Rooftop (exact address) | 95-100 | 95-100 | 90-95 | 90-95 | 90-95 | 95-100 |
|
|
| Interpolated | 85-94 | 85-94 | 80-89 | 80-89 | 80-89 | 85-94 |
|
|
| Street-level | 70-84 | 70-84 | 65-79 | 65-79 | 65-79 | 70-84 |
|
|
| Postal code | 50-69 | 50-69 | 45-64 | 45-64 | 45-64 | 50-69 |
|
|
| City | 30-49 | 30-49 | 25-44 | 25-44 | 25-44 | 30-49 |
|
|
| Province/State | 10-29 | 10-29 | 5-24 | 5-24 | 5-24 | 10-29 |
|
|
| Country | 0-9 | 0-9 | 0-4 | 0-4 | 0-4 | 0-9 |
|
|
|
|
**Confidence Thresholds:**
|
|
|
|
- **High** (90-100): Exact address match, suitable for door-knocking
|
|
- **Medium** (70-89): Street-level or interpolated, suitable for mapping
|
|
- **Low** (50-69): Postal code or city-level, needs manual verification
|
|
- **None** (<50): Unreliable, should re-geocode or manually enter coordinates
|
|
|
|
## Admin Workflow
|
|
|
|
### Single Address Geocoding
|
|
|
|
**Step 1: Enter Address**
|
|
|
|
On LocationsPage create/edit form, enter address:
|
|
|
|
```
|
|
Address: 123 Main Street
|
|
Postal Code: K1A 0B1
|
|
```
|
|
|
|
**Step 2: Click Geocode Button**
|
|
|
|
Click **Geocode** button below address field.
|
|
|
|
**Step 3: View Results**
|
|
|
|
System displays:
|
|
|
|
- **Latitude/Longitude**: Auto-populated
|
|
- **Confidence Score**: 95% (High)
|
|
- **Provider**: Google
|
|
- **Formatted Address**: 123 Main St, Ottawa, ON K1A 0B1, Canada
|
|
|
|
**Step 4: Save Location**
|
|
|
|
Click **Save** to create/update location with geocoded coordinates.
|
|
|
|
### Bulk Re-Geocoding
|
|
|
|
**Use Case:** Re-geocode locations with missing or low-confidence coordinates.
|
|
|
|
**Step 1: Open Bulk Geocode Modal**
|
|
|
|
On LocationsPage, click **Bulk Re-Geocode** button.
|
|
|
|
**Step 2: Configure Job**
|
|
|
|
Set parameters:
|
|
|
|
- **Confidence Threshold**: Only geocode locations below this score (e.g., 70)
|
|
- **Missing Only**: Only geocode locations without coordinates
|
|
- **Provider**: Choose preferred provider (or use default chain)
|
|
- **Batch Size**: Locations per batch (default: 50)
|
|
|
|
**Step 3: Start Job**
|
|
|
|
Click **Start Job** to queue job in BullMQ.
|
|
|
|
**Step 4: Monitor Progress**
|
|
|
|
View real-time progress:
|
|
|
|
- **Completed**: 234 / 1000 locations
|
|
- **Failed**: 12 locations
|
|
- **Progress**: 23.4%
|
|
- **ETA**: 8 minutes
|
|
|
|
**Step 5: Review Results**
|
|
|
|
After job completes:
|
|
|
|
- **Success Rate**: 98.8%
|
|
- **Average Confidence**: 87.3
|
|
- **Failed Addresses**: Download CSV of failures
|
|
|
|
**Step 6: Retry Failures (Optional)**
|
|
|
|
For failed addresses:
|
|
|
|
1. Download failure CSV
|
|
2. Manually verify addresses
|
|
3. Fix typos/formatting issues
|
|
4. Re-import CSV
|
|
5. Run bulk geocode again
|
|
|
|
### Reverse Geocoding
|
|
|
|
**Use Case:** Convert map click coordinates to address.
|
|
|
|
**Step 1: Click Map**
|
|
|
|
On AdminMapView, click location to get lat/lng.
|
|
|
|
**Step 2: Reverse Geocode**
|
|
|
|
Click **Reverse Geocode** button in popup.
|
|
|
|
**Step 3: View Address**
|
|
|
|
System displays:
|
|
|
|
```
|
|
Address: 123 Main St
|
|
City: Ottawa
|
|
Province: ON
|
|
Country: Canada
|
|
```
|
|
|
|
**Step 4: Create Location**
|
|
|
|
Click **Create Location** to auto-fill address form.
|
|
|
|
## Code Examples
|
|
|
|
### Geocoding Service (Backend)
|
|
|
|
```typescript
|
|
// api/src/modules/map/geocoding/geocoding.service.ts
|
|
export interface GeocodeResult {
|
|
latitude: number;
|
|
longitude: number;
|
|
confidence: number;
|
|
provider: GeocodeProvider;
|
|
formattedAddress?: string;
|
|
}
|
|
|
|
async function geocode(address: string): Promise<GeocodeResult> {
|
|
// Check Redis cache first
|
|
const cached = await getCachedResult(address);
|
|
if (cached) {
|
|
logger.debug('Geocode cache hit', { address });
|
|
return cached;
|
|
}
|
|
|
|
// Normalize address (expand abbreviations, fix postal code)
|
|
const normalized = normalizeAddress(address);
|
|
|
|
// Try providers in order
|
|
const providers = env.GEOCODING_PROVIDERS.split(',');
|
|
let lastError: Error | null = null;
|
|
|
|
for (const providerName of providers) {
|
|
try {
|
|
const result = await tryProvider(providerName, normalized);
|
|
|
|
if (result.confidence >= 50) {
|
|
// Cache successful result
|
|
await setCachedResult(address, result);
|
|
logger.info('Geocoded address', {
|
|
address,
|
|
provider: result.provider,
|
|
confidence: result.confidence,
|
|
});
|
|
return result;
|
|
}
|
|
} catch (err) {
|
|
lastError = err as Error;
|
|
logger.warn(`Provider ${providerName} failed`, { address, error: err });
|
|
continue;
|
|
}
|
|
}
|
|
|
|
throw new AppError(
|
|
500,
|
|
'All geocoding providers failed',
|
|
'GEOCODING_FAILED',
|
|
{ address, lastError: lastError?.message }
|
|
);
|
|
}
|
|
```
|
|
|
|
### Provider Chain Implementation
|
|
|
|
```typescript
|
|
// api/src/modules/map/geocoding/geocoding.service.ts
|
|
async function tryProvider(
|
|
providerName: string,
|
|
address: string
|
|
): Promise<GeocodeResult> {
|
|
switch (providerName.toUpperCase()) {
|
|
case 'GOOGLE':
|
|
return await geocodeWithGoogle(address);
|
|
case 'MAPBOX':
|
|
return await geocodeWithMapbox(address);
|
|
case 'NOMINATIM':
|
|
return await geocodeWithNominatim(address);
|
|
case 'PHOTON':
|
|
return await geocodeWithPhoton(address);
|
|
case 'LOCATIONIQ':
|
|
return await geocodeWithLocationIQ(address);
|
|
case 'ARCGIS':
|
|
return await geocodeWithArcGIS(address);
|
|
default:
|
|
throw new Error(`Unknown provider: ${providerName}`);
|
|
}
|
|
}
|
|
```
|
|
|
|
### Google Geocoding Provider
|
|
|
|
```typescript
|
|
// api/src/modules/map/geocoding/geocoding.service.ts
|
|
async function geocodeWithGoogle(address: string): Promise<GeocodeResult> {
|
|
if (!env.GOOGLE_MAPS_API_KEY) {
|
|
throw new Error('Google Maps API key not configured');
|
|
}
|
|
|
|
const url = new URL('https://maps.googleapis.com/maps/api/geocode/json');
|
|
url.searchParams.set('address', address);
|
|
url.searchParams.set('key', env.GOOGLE_MAPS_API_KEY);
|
|
|
|
const response = await fetch(url.toString());
|
|
const data = await response.json();
|
|
|
|
if (data.status !== 'OK' || !data.results?.[0]) {
|
|
throw new Error(`Google geocoding failed: ${data.status}`);
|
|
}
|
|
|
|
const result = data.results[0];
|
|
const location = result.geometry.location;
|
|
|
|
// Calculate confidence based on location_type
|
|
let confidence = 50;
|
|
if (result.geometry.location_type === 'ROOFTOP') {
|
|
confidence = 95;
|
|
} else if (result.geometry.location_type === 'RANGE_INTERPOLATED') {
|
|
confidence = 85;
|
|
} else if (result.geometry.location_type === 'GEOMETRIC_CENTER') {
|
|
confidence = 70;
|
|
}
|
|
|
|
return {
|
|
latitude: location.lat,
|
|
longitude: location.lng,
|
|
confidence,
|
|
provider: GeocodeProvider.GOOGLE,
|
|
formattedAddress: result.formatted_address,
|
|
};
|
|
}
|
|
```
|
|
|
|
### Mapbox Geocoding Provider
|
|
|
|
```typescript
|
|
// api/src/modules/map/geocoding/geocoding.service.ts
|
|
async function geocodeWithMapbox(address: string): Promise<GeocodeResult> {
|
|
if (!env.MAPBOX_ACCESS_TOKEN) {
|
|
throw new Error('Mapbox access token not configured');
|
|
}
|
|
|
|
const encodedAddress = encodeURIComponent(address);
|
|
const url = `https://api.mapbox.com/geocoding/v5/mapbox.places/${encodedAddress}.json?access_token=${env.MAPBOX_ACCESS_TOKEN}`;
|
|
|
|
const response = await fetch(url);
|
|
const data = await response.json();
|
|
|
|
if (!data.features?.[0]) {
|
|
throw new Error('Mapbox geocoding failed: no results');
|
|
}
|
|
|
|
const feature = data.features[0];
|
|
const [lng, lat] = feature.center;
|
|
|
|
// Calculate confidence based on place_type
|
|
let confidence = 50;
|
|
if (feature.place_type.includes('address')) {
|
|
confidence = 95;
|
|
} else if (feature.place_type.includes('place')) {
|
|
confidence = 60;
|
|
} else if (feature.place_type.includes('postcode')) {
|
|
confidence = 55;
|
|
}
|
|
|
|
// Boost confidence for exact match
|
|
if (feature.relevance >= 0.9) {
|
|
confidence = Math.min(100, confidence + 10);
|
|
}
|
|
|
|
return {
|
|
latitude: lat,
|
|
longitude: lng,
|
|
confidence,
|
|
provider: GeocodeProvider.MAPBOX,
|
|
formattedAddress: feature.place_name,
|
|
};
|
|
}
|
|
```
|
|
|
|
### Nominatim Geocoding Provider
|
|
|
|
```typescript
|
|
// api/src/modules/map/geocoding/geocoding.service.ts
|
|
async function geocodeWithNominatim(address: string): Promise<GeocodeResult> {
|
|
const baseUrl = env.NOMINATIM_BASE_URL || 'https://nominatim.openstreetmap.org';
|
|
const url = new URL(`${baseUrl}/search`);
|
|
url.searchParams.set('q', address);
|
|
url.searchParams.set('format', 'json');
|
|
url.searchParams.set('limit', '1');
|
|
|
|
const response = await fetch(url.toString(), {
|
|
headers: { 'User-Agent': 'Changemaker Lite/2.0' }, // Required by Nominatim
|
|
});
|
|
|
|
const data = await response.json();
|
|
|
|
if (!data?.[0]) {
|
|
throw new Error('Nominatim geocoding failed: no results');
|
|
}
|
|
|
|
const result = data[0];
|
|
const lat = parseFloat(result.lat);
|
|
const lng = parseFloat(result.lon);
|
|
|
|
// Calculate confidence based on osm_type and importance
|
|
let confidence = 50;
|
|
if (result.osm_type === 'node' && result.importance > 0.5) {
|
|
confidence = 90;
|
|
} else if (result.osm_type === 'way' && result.importance > 0.4) {
|
|
confidence = 80;
|
|
} else if (result.importance > 0.3) {
|
|
confidence = 70;
|
|
}
|
|
|
|
return {
|
|
latitude: lat,
|
|
longitude: lng,
|
|
confidence,
|
|
provider: GeocodeProvider.NOMINATIM,
|
|
formattedAddress: result.display_name,
|
|
};
|
|
}
|
|
```
|
|
|
|
### Address Normalization
|
|
|
|
```typescript
|
|
// api/src/modules/map/geocoding/geocoding.service.ts
|
|
const abbreviations: Record<string, string> = {
|
|
// Street types
|
|
'st': 'street',
|
|
'ave': 'avenue',
|
|
'blvd': 'boulevard',
|
|
'dr': 'drive',
|
|
'rd': 'road',
|
|
'ln': 'lane',
|
|
'ct': 'court',
|
|
// Directional suffixes
|
|
'n': 'north',
|
|
'ne': 'northeast',
|
|
'e': 'east',
|
|
'se': 'southeast',
|
|
's': 'south',
|
|
'sw': 'southwest',
|
|
'w': 'west',
|
|
'nw': 'northwest',
|
|
};
|
|
|
|
function normalizeAddress(address: string): string {
|
|
let normalized = address.trim().toLowerCase();
|
|
|
|
// Expand abbreviations
|
|
for (const [abbr, full] of Object.entries(abbreviations)) {
|
|
const regex = new RegExp(`\\b${abbr}\\b`, 'gi');
|
|
normalized = normalized.replace(regex, full);
|
|
}
|
|
|
|
// Normalize postal code (K1A0B1 → K1A 0B1)
|
|
normalized = normalized.replace(
|
|
/\b([A-Za-z]\d[A-Za-z])\s*(\d[A-Za-z]\d)\b/g,
|
|
(match, p1, p2) => `${p1.toUpperCase()} ${p2.toUpperCase()}`
|
|
);
|
|
|
|
// Remove extra whitespace
|
|
normalized = normalized.replace(/\s+/g, ' ').trim();
|
|
|
|
return normalized;
|
|
}
|
|
```
|
|
|
|
### Redis Caching
|
|
|
|
```typescript
|
|
// api/src/modules/map/geocoding/geocoding.service.ts
|
|
import crypto from 'crypto';
|
|
|
|
const CACHE_KEY_PREFIX = 'GEOCODE_CACHE:';
|
|
|
|
function hashAddress(address: string): string {
|
|
return crypto.createHash('sha256').update(address).digest('hex').substring(0, 16);
|
|
}
|
|
|
|
async function getCachedResult(address: string): Promise<GeocodeResult | null> {
|
|
if (env.GEOCODING_CACHE_ENABLED !== 'true') return null;
|
|
|
|
try {
|
|
const key = `${CACHE_KEY_PREFIX}${hashAddress(address)}`;
|
|
const cached = await redis.get(key);
|
|
|
|
if (!cached) {
|
|
cm_geocode_cache_misses.inc();
|
|
return null;
|
|
}
|
|
|
|
const parsed = JSON.parse(cached);
|
|
cm_geocode_cache_hits.inc();
|
|
return parsed;
|
|
} catch (err) {
|
|
logger.warn('Failed to get cached geocode result:', err);
|
|
return null;
|
|
}
|
|
}
|
|
|
|
async function setCachedResult(address: string, result: GeocodeResult): Promise<void> {
|
|
if (env.GEOCODING_CACHE_ENABLED !== 'true') return;
|
|
|
|
try {
|
|
const key = `${CACHE_KEY_PREFIX}${hashAddress(address)}`;
|
|
const ttlSeconds = env.GEOCODING_CACHE_TTL_HOURS * 60 * 60;
|
|
|
|
await redis.setex(key, ttlSeconds, JSON.stringify(result));
|
|
} catch (err) {
|
|
logger.warn('Failed to cache geocode result:', err);
|
|
}
|
|
}
|
|
```
|
|
|
|
### Bulk Geocoding Job (BullMQ)
|
|
|
|
```typescript
|
|
// api/src/services/geocode-queue.service.ts
|
|
import Bull from 'bull';
|
|
|
|
export const geocodeQueue = new Bull('geocode-queue', env.REDIS_URL, {
|
|
defaultJobOptions: {
|
|
attempts: 3,
|
|
backoff: { type: 'exponential', delay: 5000 },
|
|
removeOnComplete: 100,
|
|
removeOnFail: false,
|
|
},
|
|
});
|
|
|
|
// Bulk geocode job processor
|
|
geocodeQueue.process(async (job) => {
|
|
const { locationIds, provider, batchSize } = job.data;
|
|
|
|
logger.info('Processing bulk geocode job', {
|
|
jobId: job.id,
|
|
totalLocations: locationIds.length,
|
|
});
|
|
|
|
let completed = 0;
|
|
let failed = 0;
|
|
|
|
for (let i = 0; i < locationIds.length; i += batchSize) {
|
|
const batch = locationIds.slice(i, i + batchSize);
|
|
|
|
for (const locationId of batch) {
|
|
try {
|
|
const location = await prisma.location.findUnique({
|
|
where: { id: locationId },
|
|
});
|
|
|
|
if (!location?.address) {
|
|
failed++;
|
|
continue;
|
|
}
|
|
|
|
const result = await geocodingService.geocode(location.address);
|
|
|
|
await prisma.location.update({
|
|
where: { id: locationId },
|
|
data: {
|
|
latitude: result.latitude,
|
|
longitude: result.longitude,
|
|
geocodeConfidence: result.confidence,
|
|
geocodeProvider: result.provider,
|
|
lastGeocodeAttempt: new Date(),
|
|
},
|
|
});
|
|
|
|
completed++;
|
|
} catch (err) {
|
|
logger.warn('Failed to geocode location', { locationId, error: err });
|
|
failed++;
|
|
}
|
|
}
|
|
|
|
// Update job progress
|
|
const progress = ((i + batch.length) / locationIds.length) * 100;
|
|
await job.progress(progress);
|
|
|
|
// Rate limiting: wait 1s between batches
|
|
await new Promise((resolve) => setTimeout(resolve, 1000));
|
|
}
|
|
|
|
return { completed, failed, total: locationIds.length };
|
|
});
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Issue: All Providers Failing
|
|
|
|
**Symptoms:**
|
|
|
|
- "All geocoding providers failed" error
|
|
- Geocode confidence always 0
|
|
- No results from any provider
|
|
|
|
**Causes:**
|
|
|
|
- All API keys invalid or missing
|
|
- Network connectivity issues
|
|
- Rate limits exceeded on all providers
|
|
- Address format not recognized
|
|
|
|
**Solutions:**
|
|
|
|
1. **Verify API keys**:
|
|
|
|
```bash
|
|
# Check .env file
|
|
grep "GOOGLE_MAPS_API_KEY\|MAPBOX_ACCESS_TOKEN\|LOCATIONIQ_API_KEY" .env
|
|
|
|
# Test Google API key directly
|
|
curl "https://maps.googleapis.com/maps/api/geocode/json?address=123+Main+St&key=YOUR_KEY"
|
|
```
|
|
|
|
2. **Check provider health**:
|
|
|
|
```bash
|
|
# View Prometheus metrics
|
|
curl http://localhost:4000/metrics | grep cm_geocode
|
|
|
|
# View API logs
|
|
docker compose logs -f api | grep geocode
|
|
```
|
|
|
|
3. **Test with free provider (Nominatim)**:
|
|
|
|
```bash
|
|
# Temporarily use only Nominatim
|
|
GEOCODING_PROVIDERS=NOMINATIM
|
|
|
|
# Test endpoint
|
|
curl -X POST http://localhost:4000/api/map/locations/geocode \
|
|
-H "Authorization: Bearer YOUR_TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"address":"123 Main Street, Ottawa, ON"}'
|
|
```
|
|
|
|
### Issue: Low Confidence Scores
|
|
|
|
**Symptoms:**
|
|
|
|
- Geocode confidence consistently <70
|
|
- Coordinates appear incorrect on map
|
|
- Addresses geocoded to city-level instead of street-level
|
|
|
|
**Causes:**
|
|
|
|
- Address format ambiguous (missing street type, postal code)
|
|
- Provider using city centroid instead of exact address
|
|
- International address format not recognized
|
|
- Address doesn't exist in provider database
|
|
|
|
**Solutions:**
|
|
|
|
1. **Improve address format**:
|
|
|
|
```typescript
|
|
// Bad: missing postal code, street type
|
|
"123 Main, Ottawa"
|
|
|
|
// Good: full Canadian address
|
|
"123 Main Street, Ottawa, ON K1A 0B1"
|
|
```
|
|
|
|
2. **Try different providers**:
|
|
|
|
```bash
|
|
# Google/Mapbox best for North American addresses
|
|
GEOCODING_PROVIDERS=GOOGLE,MAPBOX,NOMINATIM
|
|
|
|
# Nominatim/Photon better for European addresses
|
|
GEOCODING_PROVIDERS=NOMINATIM,PHOTON,MAPBOX
|
|
```
|
|
|
|
3. **Manual verification**:
|
|
|
|
For critical addresses, manually verify coordinates:
|
|
|
|
```bash
|
|
# Reverse geocode to check accuracy
|
|
curl -X POST http://localhost:4000/api/map/locations/reverse-geocode \
|
|
-H "Authorization: Bearer YOUR_TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"latitude":45.4215,"longitude":-75.6972}'
|
|
```
|
|
|
|
### Issue: Bulk Geocoding Job Stuck
|
|
|
|
**Symptoms:**
|
|
|
|
- Bulk geocode progress stuck at X%
|
|
- Job running for hours without completing
|
|
- BullMQ job marked as "active" but not processing
|
|
|
|
**Causes:**
|
|
|
|
- Worker crashed mid-job
|
|
- Rate limit hit (paused for cooldown)
|
|
- Redis connection lost
|
|
- Job timeout (default: 30min)
|
|
|
|
**Solutions:**
|
|
|
|
1. **Check job status**:
|
|
|
|
```bash
|
|
# View BullMQ jobs in Redis
|
|
docker compose exec redis redis-cli KEYS "bull:geocode-queue:*"
|
|
|
|
# Get job details
|
|
docker compose exec redis redis-cli GET "bull:geocode-queue:JOB_ID"
|
|
```
|
|
|
|
2. **Restart worker**:
|
|
|
|
```bash
|
|
# Restart API service (worker runs in API container)
|
|
docker compose restart api
|
|
```
|
|
|
|
3. **Cancel stuck job**:
|
|
|
|
```bash
|
|
# Via API endpoint
|
|
curl -X POST http://localhost:4000/api/map/locations/bulk-geocode/cancel \
|
|
-H "Authorization: Bearer YOUR_TOKEN"
|
|
|
|
# Or manually in Redis
|
|
docker compose exec redis redis-cli DEL "bull:geocode-queue:ACTIVE_JOB_ID"
|
|
```
|
|
|
|
4. **Increase timeout**:
|
|
|
|
```typescript
|
|
// api/src/services/geocode-queue.service.ts
|
|
defaultJobOptions: {
|
|
timeout: 3600000, // 1 hour (was 30min)
|
|
}
|
|
```
|
|
|
|
### Issue: Cache Not Working
|
|
|
|
**Symptoms:**
|
|
|
|
- `cm_geocode_cache_hits` metric always 0
|
|
- Same address geocoded multiple times
|
|
- High API usage for repeated addresses
|
|
|
|
**Causes:**
|
|
|
|
- Redis not running
|
|
- `GEOCODING_CACHE_ENABLED=false`
|
|
- Cache keys expiring too quickly
|
|
- Address normalization inconsistent (cache miss due to formatting)
|
|
|
|
**Solutions:**
|
|
|
|
1. **Verify Redis connection**:
|
|
|
|
```bash
|
|
# Check Redis is running
|
|
docker compose ps redis
|
|
|
|
# Test Redis connection from API
|
|
docker compose exec api node -e "const redis = require('./src/config/redis').redis; redis.ping().then(console.log);"
|
|
```
|
|
|
|
2. **Check cache keys**:
|
|
|
|
```bash
|
|
# View cached geocode results
|
|
docker compose exec redis redis-cli KEYS "GEOCODE_CACHE:*"
|
|
|
|
# Get sample cached result
|
|
docker compose exec redis redis-cli GET "GEOCODE_CACHE:abc123def456"
|
|
```
|
|
|
|
3. **Enable caching**:
|
|
|
|
```bash
|
|
# Verify in .env
|
|
GEOCODING_CACHE_ENABLED=true
|
|
GEOCODING_CACHE_TTL_HOURS=168 # 7 days
|
|
```
|
|
|
|
4. **Clear cache to test**:
|
|
|
|
```bash
|
|
# Delete all geocode cache keys
|
|
docker compose exec redis redis-cli --scan --pattern "GEOCODE_CACHE:*" | xargs docker compose exec redis redis-cli DEL
|
|
```
|
|
|
|
## Performance Considerations
|
|
|
|
### Provider Rate Limits
|
|
|
|
**Free Tier Limits:**
|
|
|
|
| Provider | Free Tier | Rate Limit | Best For |
|
|
|----------|-----------|------------|----------|
|
|
| **Google** | $200/month credit (~28k reqs) | 50 req/sec | North American addresses |
|
|
| **Mapbox** | 100,000/month | 600 req/min | Global coverage |
|
|
| **Nominatim** | Unlimited | 1 req/sec | Europe, low-volume |
|
|
| **Photon** | Unlimited | No limit* | Europe, high-volume |
|
|
| **LocationIQ** | 5,000/day | 2 req/sec | Testing, low-volume |
|
|
| **ArcGIS** | 20,000/month | 50 req/sec | US addresses |
|
|
|
|
*Self-hosted Photon recommended for production high-volume use.
|
|
|
|
**Best Practices:**
|
|
|
|
1. **Enable Redis caching** (7-day TTL reduces API calls by ~80%)
|
|
2. **Use bulk geocoding jobs** (BullMQ queue with 1s delay between batches)
|
|
3. **Prefer NAR imports** (coordinates included, no geocoding needed)
|
|
4. **Set up Photon self-hosted** (for high-volume European campaigns)
|
|
|
|
### Caching Strategy
|
|
|
|
**Cache Hit Rate Optimization:**
|
|
|
|
```typescript
|
|
// Normalize address before hashing to improve cache hits
|
|
function hashAddress(address: string): string {
|
|
// Remove punctuation, lowercase, trim
|
|
const normalized = address
|
|
.toLowerCase()
|
|
.replace(/[.,]/g, '')
|
|
.replace(/\s+/g, ' ')
|
|
.trim();
|
|
|
|
return crypto.createHash('sha256').update(normalized).digest('hex').substring(0, 16);
|
|
}
|
|
```
|
|
|
|
**TTL Configuration:**
|
|
|
|
- **Development**: 24 hours (test address changes)
|
|
- **Production**: 7 days (balance freshness vs API quota)
|
|
- **NAR imports**: 30 days (addresses rarely change)
|
|
|
|
### Bulk Geocoding Performance
|
|
|
|
**Batch Size Tuning:**
|
|
|
|
```typescript
|
|
// Small batches: better for rate limits, slower overall
|
|
batchSize: 10, // 1 req/sec = 10 locations per 10s batch
|
|
|
|
// Large batches: faster, but may hit rate limits
|
|
batchSize: 100, // 50 req/sec = 100 locations per 2s batch
|
|
```
|
|
|
|
**Optimal Settings:**
|
|
|
|
| Provider | Batch Size | Delay Between Batches |
|
|
|----------|------------|-----------------------|
|
|
| Google | 50 | 1s |
|
|
| Mapbox | 100 | 10s |
|
|
| Nominatim | 1 | 1s (strict rate limit) |
|
|
| Photon | 50 | 0s (self-hosted) |
|
|
|
|
**Prometheus Metrics:**
|
|
|
|
```prometheus
|
|
# Cache hit rate (target: >80%)
|
|
rate(cm_geocode_cache_hits_total[5m]) /
|
|
(rate(cm_geocode_cache_hits_total[5m]) + rate(cm_geocode_cache_misses_total[5m]))
|
|
|
|
# Provider success rate (target: >95%)
|
|
sum by (provider) (rate(cm_geocode_success_total[5m]))
|
|
```
|
|
|
|
## Related Documentation
|
|
|
|
**Backend Modules:**
|
|
|
|
- [Geocoding Backend Module](../../backend/modules/map/geocoding.md) — Full service implementation
|
|
- [Locations Service](../../backend/modules/map/locations.md) — Geocoding integration
|
|
- [Geocode Queue Service](../../backend/modules/services/geocode-queue.md) — BullMQ worker
|
|
|
|
**Frontend Pages:**
|
|
|
|
- [LocationsPage](../../frontend/pages/admin/locations-page.md) — Geocoding UI
|
|
- [Data Quality Dashboard](../../frontend/pages/admin/data-quality-dashboard.md) — Confidence metrics
|
|
|
|
**Database:**
|
|
|
|
- [Location Model](../../database/models/map.md#location-model) — Geocoding fields
|
|
- [GeocodeProvider Enum](../../database/models/map.md#geocodeprovider-enum) — Provider types
|
|
|
|
**Features:**
|
|
|
|
- [Locations](./locations.md) — Location management system
|
|
- [Data Quality Dashboard](./data-quality.md) — Geocoding quality metrics
|
|
- [NAR Import](./nar-import.md) — Canadian electoral data (pre-geocoded)
|
|
|
|
**Configuration:**
|
|
|
|
- [Environment Variables](../../deployment/configuration.md#geocoding) — Provider setup
|
|
- [Redis Configuration](../../deployment/configuration.md#redis) — Cache setup
|