1828 lines
47 KiB
Markdown

# NAR Import System
## Overview
The National Address Register (NAR) import system enables bulk import of Canadian electoral data from Elections Canada. The system supports the 2025 NAR format with server-side streaming import, coordinate projection conversion, and comprehensive filtering options.
**Key Features:**
- Server-side streaming import (handles large datasets)
- NAR 2025 format support (BG_X/BG_Y Lambert projection)
- Address + Location file joining on LOC_GUID
- Proj4 coordinate conversion (EPSG:3347 → WGS84)
- Province selector (13 provinces/territories)
- Filtering: city, postal code, cut boundary, residential-only
- Multi-part file handling (large provinces)
- Progress tracking and error reporting
- Import statistics and validation
**Use Cases:**
- Initial campaign database setup
- Electoral district targeting
- NAR data updates (new redistribution)
- Multi-region campaign expansion
- Address database verification
**Architecture Highlights:**
- Streaming CSV parser (avoids memory limits)
- File-based LOC_GUID join
- Real-time coordinate projection
- Point-in-polygon cut filtering
- Transaction batching (500 records/commit)
- Duplicate prevention via UPSERT
## Architecture
```mermaid
flowchart TB
subgraph Admin Interface
Admin[Admin User]
LocationsPage[LocationsPage - NAR Tab]
end
subgraph API Layer
DatasetsAPI["/api/locations/nar/datasets"]
ImportAPI["/api/locations/nar/import"]
end
subgraph NAR Import Service
Scanner[File Scanner]
Reader[CSV Stream Reader]
Joiner[Address+Location Joiner]
Converter[Coordinate Converter]
Filter[Filter Pipeline]
Importer[Bulk Importer]
end
subgraph File System
DataDir[/data/NAR Files]
AddressFiles[Address_XX_part_*.csv]
LocationFiles[Location_XX.csv]
end
subgraph Database
LocationsDB[(Locations)]
AddressesDB[(Addresses)]
end
subgraph External Services
Proj4[Proj4 Library]
EPSG3347[EPSG:3347 Definition]
end
Admin --> LocationsPage
LocationsPage --> DatasetsAPI
LocationsPage --> ImportAPI
DatasetsAPI --> Scanner
Scanner --> DataDir
ImportAPI --> Reader
Reader --> AddressFiles
Reader --> LocationFiles
Reader --> Joiner
Joiner --> Converter
Converter --> Proj4
Proj4 --> EPSG3347
Converter --> Filter
Filter --> Importer
Importer --> LocationsDB
Importer --> AddressesDB
```
**Data Flow:**
1. **Dataset Discovery:**
- Scan /data directory for NAR CSV files
- Group by province code (10-62)
- Identify multi-part Address files
- Return available datasets
2. **Import Initiation:**
- Admin selects province + filters
- API creates import job
- Begins streaming CSV files
3. **File Processing:**
- Read Address files (all parts sequentially)
- Read Location file (parallel)
- Join on LOC_GUID (in-memory map)
4. **Coordinate Conversion:**
- Extract BG_X/BG_Y from Location file
- Convert EPSG:3347 → WGS84 using Proj4
- Fallback to BG_LATITUDE/BG_LONGITUDE if conversion fails
5. **Filtering:**
- City filter (exact match on MUNICIPALITY)
- Postal code filter (prefix match)
- Cut filter (point-in-polygon)
- Residential filter (BU_USE = 1)
6. **Database Import:**
- UPSERT Locations by locGuid (prevent duplicates)
- INSERT Addresses with foreign key
- Batch commits (500 records)
- Track progress and errors
## NAR File Format
### File Structure
**Directory Layout:**
```
/data/
├── Address_10.csv # Newfoundland
├── Address_11.csv # PEI
├── Address_12.csv # Nova Scotia
├── Address_13.csv # New Brunswick
├── Address_24_part_1.csv # Quebec (multi-part)
├── Address_24_part_2.csv
├── Address_24_part_3.csv
├── Address_24_part_4.csv
├── Address_24_part_5.csv
├── Address_24_part_6.csv
├── Address_35_part_1.csv # Ontario (multi-part)
├── Address_35_part_2.csv
├── ...
├── Location_10.csv
├── Location_11.csv
├── Location_12.csv
├── Location_13.csv
├── Location_24.csv
├── Location_35.csv
└── ...
```
### Address File Schema
**File: Address_XX_part_Y.csv**
```csv
ADDR_GUID,LOC_GUID,CIVIC_NO,OFFICIAL_STREET_NAME,POSTAL_CODE,MUNICIPALITY,PROVINCE_CODE
{uuid},{uuid},123,MAIN ST,M5H2N2,TORONTO,35
{uuid},{uuid},125,MAIN ST,M5H2N2,TORONTO,35
{uuid},{uuid},127,MAIN ST,M5H2N2,TORONTO,35
```
**Key Fields:**
| Field | Type | Description | Example |
|-------|------|-------------|---------|
| ADDR_GUID | UUID | Unique address identifier | `{12345678-...}` |
| LOC_GUID | UUID | Location identifier (FK) | `{87654321-...}` |
| CIVIC_NO | String | Street number | `123`, `123A`, `123-125` |
| OFFICIAL_STREET_NAME | String | Street name (uppercase) | `MAIN ST`, `YONGE ST` |
| POSTAL_CODE | String | Canadian postal code (no space) | `M5H2N2`, `K1A0B1` |
| MUNICIPALITY | String | City/town name | `TORONTO`, `OTTAWA` |
| PROVINCE_CODE | Integer | Province code (10-62) | `35` (Ontario) |
**Record Count:**
- Small provinces: 10k-50k addresses
- Medium provinces: 50k-200k addresses
- Large provinces: 200k-1M+ addresses (multi-part files)
### Location File Schema
**File: Location_XX.csv**
```csv
LOC_GUID,BG_LATITUDE,BG_LONGITUDE,BG_X,BG_Y,FED_NUM,BU_USE,MUNICIPALITY
{uuid},43.6532,-79.3832,1234567.89,234567.89,35001,1,TORONTO
{uuid},43.6540,-79.3825,1234600.00,234600.00,35001,1,TORONTO
```
**Key Fields:**
| Field | Type | Description | Example |
|-------|------|-------------|---------|
| LOC_GUID | UUID | Unique location identifier | `{87654321-...}` |
| BG_LATITUDE | Float | Latitude (WGS84) | `43.6532` |
| BG_LONGITUDE | Float | Longitude (WGS84) | `-79.3832` |
| BG_X | Float | X coord (EPSG:3347 Lambert) | `1234567.89` |
| BG_Y | Float | Y coord (EPSG:3347 Lambert) | `234567.89` |
| FED_NUM | String | Federal electoral district | `35001`, `24050` |
| BU_USE | Integer | Building use code | `1` = Residential |
| MUNICIPALITY | String | City/town name | `TORONTO` |
**Coordinate Systems:**
- **BG_LATITUDE/BG_LONGITUDE:** WGS84 decimal degrees (EPSG:4326)
- **BG_X/BG_Y:** Statistics Canada Lambert Conformal Conic (EPSG:3347)
- **2025 NAR Change:** Primary coordinates shifted from lat/lng to BG_X/BG_Y
**Building Use Codes:**
| Code | Description |
|------|-------------|
| 1 | Residential |
| 2 | Commercial |
| 3 | Industrial |
| 4 | Institutional |
| 5 | Parks/Recreation |
| 9 | Other |
## Database Models
### Location Model Extensions
```prisma
model Location {
id Int @id @default(autoincrement())
address String
latitude Float?
longitude Float?
postalCode String?
province String?
// NAR-specific fields
locGuid String? @unique // NAR LOC_GUID (UUID)
federalDistrict String? // NAR FED_NUM
buildingUse Int? // NAR BU_USE code
municipality String? // NAR MUNICIPALITY
// Geocoding metadata (populated during import)
geocodeConfidence Int? @default(100) // NAR = high confidence
geocodeProvider String? @default("NAR")
geocodedAt DateTime?
addresses Address[]
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@index([locGuid])
@@index([federalDistrict])
@@index([buildingUse])
@@index([postalCode])
}
```
### Address Model Extensions
```prisma
model Address {
id Int @id @default(autoincrement())
locationId Int
location Location @relation(fields: [locationId], references: [id], onDelete: Cascade)
// NAR-specific fields
addrGuid String? @unique // NAR ADDR_GUID (UUID)
unitNumber String? // NAR CIVIC_NO (if multi-unit)
// Voter data (future)
firstName String?
lastName String?
supportLevel Int?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@index([locationId])
@@index([addrGuid])
}
```
**UPSERT Strategy:**
```typescript
// Prevent duplicates on re-import
const location = await prisma.location.upsert({
where: { locGuid: narRecord.LOC_GUID },
update: {
address: narRecord.addressString,
latitude: coords.latitude,
longitude: coords.longitude,
postalCode: narRecord.POSTAL_CODE,
province: provinceMap[narRecord.PROVINCE_CODE],
federalDistrict: narRecord.FED_NUM,
buildingUse: narRecord.BU_USE,
municipality: narRecord.MUNICIPALITY,
geocodeProvider: 'NAR',
geocodedAt: new Date()
},
create: {
locGuid: narRecord.LOC_GUID,
address: narRecord.addressString,
latitude: coords.latitude,
longitude: coords.longitude,
postalCode: narRecord.POSTAL_CODE,
province: provinceMap[narRecord.PROVINCE_CODE],
federalDistrict: narRecord.FED_NUM,
buildingUse: narRecord.BU_USE,
municipality: narRecord.MUNICIPALITY,
geocodeConfidence: 100,
geocodeProvider: 'NAR',
geocodedAt: new Date()
}
});
```
## API Endpoints
### GET /api/locations/nar/datasets
Scan NAR data directory and return available province datasets.
**Authentication:** Required (SUPER_ADMIN, MAP_ADMIN)
**Response:**
```json
{
"datasets": [
{
"provinceCode": "10",
"provinceName": "Newfoundland and Labrador",
"addressFiles": ["Address_10.csv"],
"locationFile": "Location_10.csv",
"addressFileCount": 1,
"estimatedRecords": 15000,
"lastModified": "2025-01-15T00:00:00Z"
},
{
"provinceCode": "24",
"provinceName": "Quebec",
"addressFiles": [
"Address_24_part_1.csv",
"Address_24_part_2.csv",
"Address_24_part_3.csv",
"Address_24_part_4.csv",
"Address_24_part_5.csv",
"Address_24_part_6.csv"
],
"locationFile": "Location_24.csv",
"addressFileCount": 6,
"estimatedRecords": 850000,
"lastModified": "2025-01-20T00:00:00Z"
},
{
"provinceCode": "35",
"provinceName": "Ontario",
"addressFiles": [
"Address_35_part_1.csv",
"Address_35_part_2.csv",
"Address_35_part_3.csv"
],
"locationFile": "Location_35.csv",
"addressFileCount": 3,
"estimatedRecords": 1200000,
"lastModified": "2025-01-22T00:00:00Z"
}
],
"dataDir": "/data",
"totalDatasets": 13
}
```
**Implementation:**
```typescript
// nar-import.service.ts
async scanDatasets(): Promise<NARDataset[]> {
const files = await fs.readdir(NAR_DATA_DIR);
// Group files by province code
const provinceGroups: Record<string, { address: string[], location: string }> = {};
files.forEach(file => {
const addressMatch = file.match(/^Address_(\d+)(?:_part_\d+)?\.csv$/);
const locationMatch = file.match(/^Location_(\d+)\.csv$/);
if (addressMatch) {
const code = addressMatch[1];
if (!provinceGroups[code]) provinceGroups[code] = { address: [], location: '' };
provinceGroups[code].address.push(file);
} else if (locationMatch) {
const code = locationMatch[1];
if (!provinceGroups[code]) provinceGroups[code] = { address: [], location: '' };
provinceGroups[code].location = file;
}
});
// Build dataset objects
const datasets: NARDataset[] = [];
for (const [code, group] of Object.entries(provinceGroups)) {
if (group.address.length === 0 || !group.location) continue;
const stats = await fs.stat(path.join(NAR_DATA_DIR, group.location));
datasets.push({
provinceCode: code,
provinceName: PROVINCE_NAMES[code],
addressFiles: group.address.sort(),
locationFile: group.location,
addressFileCount: group.address.length,
estimatedRecords: await this.estimateRecordCount(group.address),
lastModified: stats.mtime.toISOString()
});
}
return datasets.sort((a, b) => a.provinceCode.localeCompare(b.provinceCode));
}
```
### POST /api/locations/nar/import
Start NAR import job with filters.
**Authentication:** Required (SUPER_ADMIN, MAP_ADMIN)
**Request Body:**
```json
{
"provinceCode": "35",
"city": "TORONTO",
"postalCodePrefix": "M5",
"cutId": 42,
"residentialOnly": true
}
```
**Parameters:**
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| provinceCode | string | Yes | Province code (10-62) |
| city | string | No | Filter by MUNICIPALITY (exact match, uppercase) |
| postalCodePrefix | string | No | Filter by postal code prefix (e.g., "M5", "K1A") |
| cutId | number | No | Filter by cut boundary (point-in-polygon) |
| residentialOnly | boolean | No | Only import BU_USE = 1 (default: false) |
**Response:**
```json
{
"jobId": "nar-import-35-20250213-103000",
"status": "processing",
"provinceCode": "35",
"provinceName": "Ontario",
"filters": {
"city": "TORONTO",
"postalCodePrefix": "M5",
"cutId": 42,
"residentialOnly": true
},
"startedAt": "2025-02-13T10:30:00Z",
"estimatedCompletion": "2025-02-13T10:45:00Z"
}
```
### GET /api/locations/nar/import/:jobId
Check import job progress.
**Authentication:** Required (SUPER_ADMIN, MAP_ADMIN)
**Response (In Progress):**
```json
{
"jobId": "nar-import-35-20250213-103000",
"status": "processing",
"progress": {
"total": 1200000,
"processed": 600000,
"imported": 580000,
"skipped": 15000,
"errors": 5000,
"percent": 50.0
},
"currentFile": "Address_35_part_2.csv",
"startedAt": "2025-02-13T10:30:00Z",
"estimatedCompletion": "2025-02-13T10:45:00Z"
}
```
**Response (Complete):**
```json
{
"jobId": "nar-import-35-20250213-103000",
"status": "completed",
"result": {
"total": 1200000,
"processed": 1200000,
"imported": 1150000,
"skipped": 45000,
"errors": 5000,
"percent": 100.0
},
"statistics": {
"locationsCreated": 800000,
"locationsUpdated": 350000,
"addressesCreated": 1150000,
"avgConfidence": 100,
"processingTime": "14m 32s"
},
"startedAt": "2025-02-13T10:30:00Z",
"completedAt": "2025-02-13T10:44:32Z"
}
```
**Status Values:**
- `queued`: Job created, waiting to start
- `processing`: Import in progress
- `completed`: Import finished successfully
- `failed`: Import failed with errors
- `cancelled`: Import cancelled by user
## Configuration
### Environment Variables
| Variable | Type | Default | Description |
|----------|------|---------|-------------|
| NAR_DATA_DIR | string | /data | Directory containing NAR CSV files |
| NAR_BATCH_SIZE | number | 500 | Records per database transaction |
| NAR_IMPORT_TIMEOUT | number | 3600000 | Import timeout in ms (1 hour) |
### Province Codes
Complete mapping of NAR province codes:
```typescript
// nar-import.service.ts
const PROVINCE_NAMES: Record<string, string> = {
'10': 'Newfoundland and Labrador',
'11': 'Prince Edward Island',
'12': 'Nova Scotia',
'13': 'New Brunswick',
'24': 'Quebec',
'35': 'Ontario',
'46': 'Manitoba',
'47': 'Saskatchewan',
'48': 'Alberta',
'59': 'British Columbia',
'60': 'Yukon',
'61': 'Northwest Territories',
'62': 'Nunavut'
};
const PROVINCE_ABBREVIATIONS: Record<string, string> = {
'10': 'NL',
'11': 'PE',
'12': 'NS',
'13': 'NB',
'24': 'QC',
'35': 'ON',
'46': 'MB',
'47': 'SK',
'48': 'AB',
'59': 'BC',
'60': 'YT',
'61': 'NT',
'62': 'NU'
};
```
### Coordinate Projection
**EPSG:3347 Definition (Statistics Canada Lambert Conformal Conic):**
```typescript
import proj4 from 'proj4';
// Define EPSG:3347 projection
proj4.defs('EPSG:3347', '+proj=lcc +lat_1=49 +lat_2=77 +lat_0=63.390675 +lon_0=-91.86666666666666 +x_0=6200000 +y_0=3000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs');
// Convert function
const convertCoordinates = (bgX: number, bgY: number): [number, number] => {
// Input: [X, Y] in EPSG:3347 (meters)
// Output: [longitude, latitude] in WGS84 (degrees)
return proj4('EPSG:3347', 'WGS84', [bgX, bgY]);
};
```
**Projection Parameters:**
- **Type:** Lambert Conformal Conic
- **Standard Parallels:** 49°N, 77°N
- **Central Meridian:** -91.866667°
- **Origin:** 63.390675°N, -91.866667°W
- **False Easting:** 6,200,000 m
- **False Northing:** 3,000,000 m
- **Ellipsoid:** GRS80
- **Units:** Meters
**Example Conversion:**
```typescript
// Toronto City Hall coordinates
const bgX = 609091.8; // EPSG:3347 X
const bgY = 4834610.7; // EPSG:3347 Y
const [lng, lat] = proj4('EPSG:3347', 'WGS84', [bgX, bgY]);
// Result: lng = -79.3832, lat = 43.6532
```
## Import Workflow
### Prepare NAR Files
**Step 1: Download NAR Data**
1. Visit Elections Canada NAR portal: https://www.elections.ca/NAR
2. Select "2025 National Address Register"
3. Download province-specific CSV files
4. Extract ZIP archives
**Step 2: Upload Files to Server**
```bash
# Create data directory if not exists
mkdir -p /path/to/data
# Upload files via SCP
scp Address_35_*.csv user@server:/path/to/data/
scp Location_35.csv user@server:/path/to/data/
# Or mount volume in Docker
# docker-compose.yml:
volumes:
- ./data:/data:ro
```
**Step 3: Verify File Integrity**
```bash
# Check file count
ls -l /path/to/data/Address_35_*.csv | wc -l
# Check Location file exists
ls -l /path/to/data/Location_35.csv
# Sample first few rows
head -5 /path/to/data/Address_35_part_1.csv
head -5 /path/to/data/Location_35.csv
```
### Run Import via Admin UI
**Step 1: Navigate to NAR Import Tab**
1. Log in as SUPER_ADMIN or MAP_ADMIN
2. Click **Map****Locations** in sidebar
3. Click **NAR Import** tab
4. Available datasets load automatically
**Step 2: Select Province**
```plaintext
┌─────────────────────────────────────────┐
│ Available NAR Datasets │
├─────────────────────────────────────────┤
│ Province │ Files │ Records │
├──────────────────┼───────┼──────────────┤
│ Ontario (35) │ 3 │ 1,200,000 │
│ Quebec (24) │ 6 │ 850,000 │
│ Alberta (48) │ 2 │ 450,000 │
└──────────────────┴───────┴──────────────┘
[Select Province: Ontario ▼]
```
**Step 3: Configure Filters (Optional)**
```plaintext
Filters (Optional):
City: [TORONTO ]
Filter by exact municipality name (uppercase)
Postal Code Prefix: [M5 ]
Filter by postal code prefix (2-3 chars)
Cut Boundary: [Downtown Core ▼ ]
Only import locations within cut polygon
☑ Residential Only
Only import buildings with BU_USE = 1
```
**Step 4: Review Import Summary**
```plaintext
Import Summary:
Province: Ontario (35)
Files: Address_35_part_1.csv
Address_35_part_2.csv
Address_35_part_3.csv
Location_35.csv
Filters:
City: TORONTO
Postal Code: M5
Cut: Downtown Core
Residential Only: Yes
Estimated Records: ~50,000 (after filters)
Estimated Time: ~3 minutes
[Cancel] [Start Import]
```
**Step 5: Monitor Progress**
```plaintext
Import in Progress...
Current File: Address_35_part_2.csv
Progress: 600,000 / 1,200,000 (50%)
[████████████░░░░░░░░░░░░] 50%
Statistics:
Processed: 600,000
Imported: 580,000
Skipped: 15,000
Errors: 5,000
[Cancel Import]
```
**Step 6: Review Results**
```plaintext
Import Complete!
Final Statistics:
Total Processed: 1,200,000
Successfully Imported: 1,150,000
Skipped (Filters): 45,000
Errors: 5,000
Details:
Locations Created: 800,000
Locations Updated: 350,000
Addresses Created: 1,150,000
Processing Time: 14m 32s
Avg Records/Second: 1,375
[View Import Log] [Import Another Province] [Close]
```
### Import via API
**Step 1: Get Available Datasets**
```bash
curl -X GET http://localhost:4000/api/locations/nar/datasets \
-H "Authorization: Bearer $TOKEN"
```
**Step 2: Start Import**
```bash
curl -X POST http://localhost:4000/api/locations/nar/import \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"provinceCode": "35",
"city": "TORONTO",
"postalCodePrefix": "M5",
"residentialOnly": true
}'
```
**Step 3: Poll Job Status**
```bash
JOB_ID="nar-import-35-20250213-103000"
while true; do
STATUS=$(curl -s -X GET \
http://localhost:4000/api/locations/nar/import/$JOB_ID \
-H "Authorization: Bearer $TOKEN" \
| jq -r '.status')
if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ]; then
break
fi
sleep 5
done
# Get final result
curl -X GET http://localhost:4000/api/locations/nar/import/$JOB_ID \
-H "Authorization: Bearer $TOKEN" | jq
```
## Coordinate Conversion
### Proj4 Integration
**Installation:**
```bash
npm install proj4
# TypeScript types included in package
```
**Service Implementation:**
```typescript
// nar-import.service.ts
import proj4 from 'proj4';
// Define EPSG:3347 (Statistics Canada Lambert)
proj4.defs('EPSG:3347',
'+proj=lcc +lat_1=49 +lat_2=77 +lat_0=63.390675 ' +
'+lon_0=-91.86666666666666 +x_0=6200000 +y_0=3000000 ' +
'+ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs'
);
interface Coordinates {
latitude: number;
longitude: number;
}
class NARImportService {
/**
* Convert NAR BG_X/BG_Y (EPSG:3347) to WGS84 lat/lng
*/
convertCoordinates(bgX: number, bgY: number): Coordinates | null {
try {
// Validate inputs
if (!bgX || !bgY || bgX < 0 || bgY < 0) {
logger.warn('Invalid BG_X/BG_Y coordinates:', { bgX, bgY });
return null;
}
// Convert: EPSG:3347 → WGS84
const [longitude, latitude] = proj4('EPSG:3347', 'WGS84', [bgX, bgY]);
// Validate output (Canada bounds)
if (
latitude < 41.0 || latitude > 84.0 || // Canada latitude range
longitude < -141.0 || longitude > -52.0 // Canada longitude range
) {
logger.warn('Converted coordinates outside Canada:', { latitude, longitude });
return null;
}
return { latitude, longitude };
} catch (error) {
logger.error('Coordinate conversion failed:', error);
return null;
}
}
/**
* Get coordinates from NAR record (try BG_X/BG_Y, fallback to lat/lng)
*/
getCoordinates(narLocation: NARLocationRecord): Coordinates | null {
// Primary: Convert BG_X/BG_Y
if (narLocation.BG_X && narLocation.BG_Y) {
const coords = this.convertCoordinates(narLocation.BG_X, narLocation.BG_Y);
if (coords) return coords;
}
// Fallback: Use BG_LATITUDE/BG_LONGITUDE directly
if (narLocation.BG_LATITUDE && narLocation.BG_LONGITUDE) {
return {
latitude: narLocation.BG_LATITUDE,
longitude: narLocation.BG_LONGITUDE
};
}
return null;
}
}
```
### Conversion Examples
**Example 1: Toronto City Hall**
```typescript
const bgX = 609091.8;
const bgY = 4834610.7;
const coords = convertCoordinates(bgX, bgY);
// Result: { latitude: 43.6532, longitude: -79.3832 }
```
**Example 2: Parliament Hill, Ottawa**
```typescript
const bgX = 447384.4;
const bgY = 5030660.5;
const coords = convertCoordinates(bgX, bgY);
// Result: { latitude: 45.4236, longitude: -75.7009 }
```
**Example 3: Invalid Coordinates**
```typescript
const bgX = -1000; // Negative (invalid)
const bgY = 0; // Zero (invalid)
const coords = convertCoordinates(bgX, bgY);
// Result: null
```
### Validation
**Canada Bounds Check:**
```typescript
const isWithinCanada = (lat: number, lng: number): boolean => {
return (
lat >= 41.0 && lat <= 84.0 && // Latitude: Pelee Island to Alert
lng >= -141.0 && lng <= -52.0 // Longitude: Yukon to Newfoundland
);
};
```
**Precision Check:**
```typescript
// NAR coordinates should have 2-6 decimal places
const hasValidPrecision = (value: number): boolean => {
const str = value.toString();
const decimals = str.split('.')[1]?.length || 0;
return decimals >= 2 && decimals <= 6;
};
```
## Multi-Part File Handling
### Large Province Processing
**Quebec (Province Code 24):**
- 6 Address files: Address_24_part_1.csv through Address_24_part_6.csv
- 1 Location file: Location_24.csv
- Total records: ~850,000
**Ontario (Province Code 35):**
- 3 Address files: Address_35_part_1.csv through Address_35_part_3.csv
- 1 Location file: Location_35.csv
- Total records: ~1,200,000
### Sequential File Reading
```typescript
// nar-import.service.ts
async processAddressFiles(provinceCode: string): Promise<Map<string, AddressRecord[]>> {
const addressMap = new Map<string, AddressRecord[]>();
// Find all Address files for province
const files = await fs.readdir(NAR_DATA_DIR);
const addressFiles = files
.filter(f => f.match(new RegExp(`^Address_${provinceCode}(?:_part_\\d+)?\\.csv$`)))
.sort(); // Ensure part_1, part_2, ... order
logger.info(`Processing ${addressFiles.length} address files for province ${provinceCode}`);
// Process each file sequentially
for (const file of addressFiles) {
logger.info(`Reading ${file}...`);
const filePath = path.join(NAR_DATA_DIR, file);
const stream = fs.createReadStream(filePath);
const parser = stream.pipe(csvParser());
let rowCount = 0;
for await (const row of parser) {
const locGuid = row.LOC_GUID;
if (!addressMap.has(locGuid)) {
addressMap.set(locGuid, []);
}
addressMap.get(locGuid)!.push({
addrGuid: row.ADDR_GUID,
civicNo: row.CIVIC_NO,
streetName: row.OFFICIAL_STREET_NAME,
postalCode: row.POSTAL_CODE,
municipality: row.MUNICIPALITY
});
rowCount++;
if (rowCount % 10000 === 0) {
logger.debug(`Processed ${rowCount} addresses from ${file}`);
}
}
logger.info(`Completed ${file}: ${rowCount} addresses`);
}
logger.info(`Total unique locations: ${addressMap.size}`);
return addressMap;
}
```
### Memory Management
**Streaming Strategy:**
```typescript
// Process files in chunks to avoid memory overflow
async processInChunks(
addressMap: Map<string, AddressRecord[]>,
locationFile: string,
batchSize: number = 500
): Promise<ImportResult> {
const locationPath = path.join(NAR_DATA_DIR, locationFile);
const stream = fs.createReadStream(locationPath);
const parser = stream.pipe(csvParser());
let batch: LocationImport[] = [];
let stats = { imported: 0, skipped: 0, errors: 0 };
for await (const row of parser) {
const locGuid = row.LOC_GUID;
const addresses = addressMap.get(locGuid);
if (!addresses || addresses.length === 0) {
stats.skipped++;
continue;
}
// Apply filters
if (!this.passesFilters(row, addresses)) {
stats.skipped++;
continue;
}
// Convert coordinates
const coords = this.getCoordinates(row);
if (!coords) {
stats.errors++;
continue;
}
batch.push({ location: row, addresses, coords });
// Import batch when full
if (batch.length >= batchSize) {
await this.importBatch(batch);
stats.imported += batch.length;
batch = [];
}
}
// Import remaining
if (batch.length > 0) {
await this.importBatch(batch);
stats.imported += batch.length;
}
return stats;
}
```
**Batch Transaction:**
```typescript
async importBatch(batch: LocationImport[]): Promise<void> {
await prisma.$transaction(async (tx) => {
for (const item of batch) {
// Upsert location
const location = await tx.location.upsert({
where: { locGuid: item.location.LOC_GUID },
update: {
address: this.formatAddress(item.addresses[0]),
latitude: item.coords.latitude,
longitude: item.coords.longitude,
postalCode: item.addresses[0].postalCode,
federalDistrict: item.location.FED_NUM,
buildingUse: parseInt(item.location.BU_USE),
municipality: item.location.MUNICIPALITY,
geocodedAt: new Date()
},
create: {
locGuid: item.location.LOC_GUID,
address: this.formatAddress(item.addresses[0]),
latitude: item.coords.latitude,
longitude: item.coords.longitude,
postalCode: item.addresses[0].postalCode,
federalDistrict: item.location.FED_NUM,
buildingUse: parseInt(item.location.BU_USE),
municipality: item.location.MUNICIPALITY,
geocodeConfidence: 100,
geocodeProvider: 'NAR',
geocodedAt: new Date()
}
});
// Insert addresses
for (const addr of item.addresses) {
await tx.address.upsert({
where: { addrGuid: addr.addrGuid },
update: { locationId: location.id },
create: {
addrGuid: addr.addrGuid,
locationId: location.id,
unitNumber: addr.civicNo
}
});
}
}
});
}
```
## Code Examples
### LocationsPage - NAR Import Tab
```typescript
// LocationsPage.tsx
import React, { useEffect, useState } from 'react';
import { Tabs, Table, Button, Select, Input, Checkbox, Card, Progress, message } from 'antd';
import { UploadOutlined } from '@ant-design/icons';
import { api } from '@/lib/api';
const NARImportTab: React.FC = () => {
const [datasets, setDatasets] = useState<NARDataset[]>([]);
const [selectedProvince, setSelectedProvince] = useState<string | null>(null);
const [filters, setFilters] = useState({
city: '',
postalCodePrefix: '',
cutId: null as number | null,
residentialOnly: true
});
const [importing, setImporting] = useState(false);
const [progress, setProgress] = useState<ImportProgress | null>(null);
const [jobId, setJobId] = useState<string | null>(null);
useEffect(() => {
fetchDatasets();
}, []);
useEffect(() => {
if (jobId && importing) {
const interval = setInterval(pollProgress, 2000);
return () => clearInterval(interval);
}
}, [jobId, importing]);
const fetchDatasets = async () => {
try {
const { data } = await api.get<{ datasets: NARDataset[] }>('/locations/nar/datasets');
setDatasets(data.datasets);
} catch (error) {
message.error('Failed to load NAR datasets');
}
};
const pollProgress = async () => {
if (!jobId) return;
try {
const { data } = await api.get(`/locations/nar/import/${jobId}`);
if (data.status === 'completed') {
setImporting(false);
setProgress(null);
message.success(`Import complete! Imported ${data.result.imported} locations.`);
} else if (data.status === 'failed') {
setImporting(false);
setProgress(null);
message.error('Import failed. Check logs for details.');
} else {
setProgress(data.progress);
}
} catch (error) {
message.error('Failed to fetch import progress');
}
};
const startImport = async () => {
if (!selectedProvince) {
message.warning('Please select a province');
return;
}
try {
const { data } = await api.post('/locations/nar/import', {
provinceCode: selectedProvince,
...filters
});
setJobId(data.jobId);
setImporting(true);
message.info('Import started...');
} catch (error) {
message.error('Failed to start import');
}
};
const datasetColumns = [
{ title: 'Province', dataIndex: 'provinceName', key: 'name' },
{ title: 'Files', dataIndex: 'addressFileCount', key: 'files' },
{ title: 'Estimated Records', dataIndex: 'estimatedRecords', key: 'records',
render: (val: number) => val.toLocaleString() },
{ title: 'Last Modified', dataIndex: 'lastModified', key: 'modified',
render: (val: string) => new Date(val).toLocaleDateString() }
];
return (
<div>
<Card title="Available NAR Datasets" style={{ marginBottom: 24 }}>
<Table
dataSource={datasets}
columns={datasetColumns}
rowKey="provinceCode"
pagination={false}
onRow={(record) => ({
onClick: () => setSelectedProvince(record.provinceCode),
style: {
cursor: 'pointer',
backgroundColor: selectedProvince === record.provinceCode ? '#e6f7ff' : undefined
}
})}
/>
</Card>
{selectedProvince && (
<Card title="Import Configuration">
<div style={{ marginBottom: 16 }}>
<label>Province: </label>
<strong>{datasets.find(d => d.provinceCode === selectedProvince)?.provinceName}</strong>
</div>
<div style={{ marginBottom: 16 }}>
<label>City (Optional): </label>
<Input
style={{ width: 300 }}
placeholder="TORONTO"
value={filters.city}
onChange={e => setFilters({ ...filters, city: e.target.value.toUpperCase() })}
/>
</div>
<div style={{ marginBottom: 16 }}>
<label>Postal Code Prefix (Optional): </label>
<Input
style={{ width: 200 }}
placeholder="M5"
value={filters.postalCodePrefix}
onChange={e => setFilters({ ...filters, postalCodePrefix: e.target.value.toUpperCase() })}
/>
</div>
<div style={{ marginBottom: 16 }}>
<Checkbox
checked={filters.residentialOnly}
onChange={e => setFilters({ ...filters, residentialOnly: e.target.checked })}
>
Residential Only
</Checkbox>
</div>
<Button
type="primary"
icon={<UploadOutlined />}
onClick={startImport}
loading={importing}
disabled={importing}
>
Start Import
</Button>
</Card>
)}
{importing && progress && (
<Card title="Import Progress" style={{ marginTop: 24 }}>
<Progress percent={progress.percent} status="active" />
<div style={{ marginTop: 16 }}>
<p>Processed: {progress.processed.toLocaleString()} / {progress.total.toLocaleString()}</p>
<p>Imported: {progress.imported.toLocaleString()}</p>
<p>Skipped: {progress.skipped.toLocaleString()}</p>
<p>Errors: {progress.errors.toLocaleString()}</p>
</div>
</Card>
)}
</div>
);
};
```
### NAR Import Service - Full Implementation
```typescript
// nar-import.service.ts
import fs from 'fs/promises';
import path from 'path';
import csvParser from 'csv-parser';
import proj4 from 'proj4';
import { prisma } from '@/config/database';
import { logger } from '@/utils/logger';
// Define EPSG:3347
proj4.defs('EPSG:3347',
'+proj=lcc +lat_1=49 +lat_2=77 +lat_0=63.390675 ' +
'+lon_0=-91.86666666666666 +x_0=6200000 +y_0=3000000 ' +
'+ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs'
);
const NAR_DATA_DIR = process.env.NAR_DATA_DIR || '/data';
const BATCH_SIZE = parseInt(process.env.NAR_BATCH_SIZE || '500');
interface NARAddressRecord {
ADDR_GUID: string;
LOC_GUID: string;
CIVIC_NO: string;
OFFICIAL_STREET_NAME: string;
POSTAL_CODE: string;
MUNICIPALITY: string;
}
interface NARLocationRecord {
LOC_GUID: string;
BG_LATITUDE?: number;
BG_LONGITUDE?: number;
BG_X?: number;
BG_Y?: number;
FED_NUM: string;
BU_USE: string;
MUNICIPALITY: string;
}
export class NARImportService {
async importProvince(
provinceCode: string,
filters: {
city?: string;
postalCodePrefix?: string;
cutId?: number;
residentialOnly?: boolean;
}
): Promise<ImportResult> {
logger.info(`Starting NAR import for province ${provinceCode}`, { filters });
// Load address files into memory map
const addressMap = await this.loadAddressFiles(provinceCode, filters);
// Process location file and import
const result = await this.processLocationFile(provinceCode, addressMap, filters);
logger.info(`NAR import complete for province ${provinceCode}`, result);
return result;
}
private async loadAddressFiles(
provinceCode: string,
filters: { city?: string; postalCodePrefix?: string }
): Promise<Map<string, NARAddressRecord[]>> {
const addressMap = new Map<string, NARAddressRecord[]>();
const files = await fs.readdir(NAR_DATA_DIR);
const addressFiles = files
.filter(f => f.match(new RegExp(`^Address_${provinceCode}(?:_part_\\d+)?\\.csv$`)))
.sort();
for (const file of addressFiles) {
logger.info(`Reading ${file}...`);
const filePath = path.join(NAR_DATA_DIR, file);
const stream = require('fs').createReadStream(filePath);
const parser = stream.pipe(csvParser());
for await (const row of parser) {
// Apply filters
if (filters.city && row.MUNICIPALITY !== filters.city) continue;
if (filters.postalCodePrefix && !row.POSTAL_CODE.startsWith(filters.postalCodePrefix)) continue;
const locGuid = row.LOC_GUID;
if (!addressMap.has(locGuid)) {
addressMap.set(locGuid, []);
}
addressMap.get(locGuid)!.push(row);
}
}
logger.info(`Loaded ${addressMap.size} unique locations`);
return addressMap;
}
private async processLocationFile(
provinceCode: string,
addressMap: Map<string, NARAddressRecord[]>,
filters: { cutId?: number; residentialOnly?: boolean }
): Promise<ImportResult> {
const locationFile = `Location_${provinceCode}.csv`;
const filePath = path.join(NAR_DATA_DIR, locationFile);
const stream = require('fs').createReadStream(filePath);
const parser = stream.pipe(csvParser());
let batch: any[] = [];
const stats = { imported: 0, skipped: 0, errors: 0, total: 0 };
for await (const row of parser) {
stats.total++;
const locGuid = row.LOC_GUID;
const addresses = addressMap.get(locGuid);
if (!addresses || addresses.length === 0) {
stats.skipped++;
continue;
}
// Residential filter
if (filters.residentialOnly && parseInt(row.BU_USE) !== 1) {
stats.skipped++;
continue;
}
// Convert coordinates
const coords = this.getCoordinates(row);
if (!coords) {
stats.errors++;
continue;
}
// Cut filter (if specified)
if (filters.cutId) {
const cut = await prisma.cut.findUnique({ where: { id: filters.cutId } });
if (cut && !this.isPointInPolygon([coords.longitude, coords.latitude], cut.geojson)) {
stats.skipped++;
continue;
}
}
batch.push({ location: row, addresses, coords });
if (batch.length >= BATCH_SIZE) {
await this.importBatch(batch);
stats.imported += batch.length;
batch = [];
}
}
if (batch.length > 0) {
await this.importBatch(batch);
stats.imported += batch.length;
}
return stats;
}
private getCoordinates(row: NARLocationRecord): { latitude: number; longitude: number } | null {
// Try BG_X/BG_Y conversion
if (row.BG_X && row.BG_Y) {
try {
const [lng, lat] = proj4('EPSG:3347', 'WGS84', [row.BG_X, row.BG_Y]);
if (lat >= 41 && lat <= 84 && lng >= -141 && lng <= -52) {
return { latitude: lat, longitude: lng };
}
} catch (error) {
logger.warn('Coordinate conversion failed:', error);
}
}
// Fallback to BG_LATITUDE/BG_LONGITUDE
if (row.BG_LATITUDE && row.BG_LONGITUDE) {
return { latitude: row.BG_LATITUDE, longitude: row.BG_LONGITUDE };
}
return null;
}
private async importBatch(batch: any[]): Promise<void> {
await prisma.$transaction(async (tx) => {
for (const item of batch) {
const location = await tx.location.upsert({
where: { locGuid: item.location.LOC_GUID },
update: {
address: this.formatAddress(item.addresses[0]),
latitude: item.coords.latitude,
longitude: item.coords.longitude,
postalCode: item.addresses[0].POSTAL_CODE,
federalDistrict: item.location.FED_NUM,
buildingUse: parseInt(item.location.BU_USE),
municipality: item.location.MUNICIPALITY
},
create: {
locGuid: item.location.LOC_GUID,
address: this.formatAddress(item.addresses[0]),
latitude: item.coords.latitude,
longitude: item.coords.longitude,
postalCode: item.addresses[0].POSTAL_CODE,
federalDistrict: item.location.FED_NUM,
buildingUse: parseInt(item.location.BU_USE),
municipality: item.location.MUNICIPALITY,
geocodeConfidence: 100,
geocodeProvider: 'NAR'
}
});
for (const addr of item.addresses) {
await tx.address.upsert({
where: { addrGuid: addr.ADDR_GUID },
update: {},
create: {
addrGuid: addr.ADDR_GUID,
locationId: location.id,
unitNumber: addr.CIVIC_NO
}
});
}
}
});
}
private formatAddress(addr: NARAddressRecord): string {
return `${addr.CIVIC_NO} ${addr.OFFICIAL_STREET_NAME}`.trim();
}
private isPointInPolygon(point: [number, number], geojson: any): boolean {
// Point-in-polygon implementation
// (Same as in spatial.ts)
return true; // Placeholder
}
}
```
## Troubleshooting
### Problem: No datasets found
**Symptoms:**
- GET /api/locations/nar/datasets returns empty array
- "No datasets available" message in admin
**Solutions:**
1. **Verify NAR_DATA_DIR path:**
```bash
echo $NAR_DATA_DIR
ls -la /data
```
2. **Check Docker volume mount:**
```yaml
# docker-compose.yml
services:
api:
volumes:
- ./data:/data:ro
```
3. **Verify file naming convention:**
```bash
# Correct:
Address_35_part_1.csv
Location_35.csv
# Incorrect:
address_35.csv # Lowercase
Addresses_35.csv # Plural
Address35.csv # No underscore
```
4. **Check file permissions:**
```bash
chmod 644 /data/Address_*.csv
chmod 644 /data/Location_*.csv
```
### Problem: Coordinate conversion errors
**Symptoms:**
- Many locations skipped during import
- "Converted coordinates outside Canada" warnings
- Null latitude/longitude in database
**Solutions:**
1. **Verify BG_X/BG_Y values:**
```typescript
// Valid range for Canada (EPSG:3347):
// BG_X: ~400,000 to 3,000,000
// BG_Y: ~4,600,000 to 9,000,000
console.log('BG_X:', narRecord.BG_X); // Should be 6-7 digits
console.log('BG_Y:', narRecord.BG_Y); // Should be 7 digits
```
2. **Test with known coordinates:**
```typescript
// Toronto City Hall
const [lng, lat] = proj4('EPSG:3347', 'WGS84', [609091.8, 4834610.7]);
console.log('Expected: 43.6532, -79.3832');
console.log('Got:', lat, lng);
```
3. **Fallback to BG_LATITUDE/BG_LONGITUDE:**
```typescript
// If BG_X/BG_Y missing or invalid, use lat/lng directly
if (!coords && narRecord.BG_LATITUDE && narRecord.BG_LONGITUDE) {
coords = {
latitude: narRecord.BG_LATITUDE,
longitude: narRecord.BG_LONGITUDE
};
}
```
4. **Check proj4 definition:**
```bash
npm list proj4
# Ensure version 2.8.0+
```
### Problem: Import very slow (> 30min for 100k records)
**Symptoms:**
- Import hangs on large provinces
- Memory usage grows over time
- Database connection timeouts
**Solutions:**
1. **Increase batch size:**
```env
NAR_BATCH_SIZE=1000 # Default: 500
```
2. **Use streaming instead of loading all addresses:**
```typescript
// DON'T do this (loads all into memory):
const allAddresses = await readAllAddressFiles();
// DO this (stream and process incrementally):
for await (const addressBatch of streamAddressFiles()) {
processBatch(addressBatch);
}
```
3. **Optimize database indexes:**
```sql
CREATE INDEX CONCURRENTLY idx_locations_loc_guid ON "Location"(locGuid);
CREATE INDEX CONCURRENTLY idx_addresses_addr_guid ON "Address"(addrGuid);
```
4. **Disable geocoding during import:**
```typescript
// Skip geocoding service since NAR already has coordinates
geocodeConfidence: 100,
geocodeProvider: 'NAR'
// No call to geocodingService.geocode()
```
5. **Use worker threads for parallel processing:**
```typescript
import { Worker } from 'worker_threads';
const workers = [];
for (let i = 0; i < 4; i++) {
const worker = new Worker('./nar-import-worker.js');
workers.push(worker);
}
```
### Problem: Duplicate LOC_GUID errors
**Symptoms:**
- Unique constraint violation on locGuid
- Import fails mid-process
- "Duplicate key value violates unique constraint" error
**Solutions:**
1. **Use UPSERT instead of INSERT:**
```typescript
await prisma.location.upsert({
where: { locGuid: narRecord.LOC_GUID },
update: { /* update fields */ },
create: { /* create fields */ }
});
```
2. **Check for corrupt NAR files:**
```bash
# Count unique LOC_GUIDs
cut -d, -f2 Address_35_part_1.csv | sort | uniq | wc -l
# Check for duplicates
cut -d, -f2 Address_35_part_1.csv | sort | uniq -d
```
3. **Clean up partial imports:**
```sql
-- Delete locations from failed import
DELETE FROM "Location" WHERE "geocodeProvider" = 'NAR' AND "createdAt" > '2025-02-13';
```
4. **Implement transaction rollback on error:**
```typescript
try {
await prisma.$transaction(async (tx) => {
// Import batch
});
} catch (error) {
logger.error('Batch failed, rolling back:', error);
// Transaction automatically rolled back
}
```
## Performance Considerations
### Import Speed
**Benchmarks:**
| Province | Records | Files | Time | Records/Second |
|----------|---------|-------|------|----------------|
| PEI (11) | 15,000 | 1 | 12s | 1,250 |
| Nova Scotia (12) | 85,000 | 1 | 1m 10s | 1,214 |
| Quebec (24) | 850,000 | 6 | 11m 20s | 1,250 |
| Ontario (35) | 1,200,000 | 3 | 14m 30s | 1,379 |
**Factors:**
- Batch size: 500 (optimal for most systems)
- Coordinate conversion: ~0.1ms per record
- Database write: ~0.5ms per location (depends on disk speed)
- Total overhead: ~0.7ms per record
### Memory Usage
**Peak Memory:**
- Address map (in-memory): ~200MB per 100k records
- CSV parser buffer: ~10MB
- Batch buffer: ~5MB (500 records)
- Total: ~220MB per 100k records
**Optimization:**
- Stream address files instead of loading all
- Process location file in chunks
- Clear batch after each commit
- Limit concurrent transactions
### Database Load
**Transaction Rate:**
- 1 transaction per batch (500 records)
- ~2-3 transactions/second
- Low database CPU (~10-20%)
- Moderate disk I/O (sequential writes)
**Connection Pool:**
```typescript
// prisma/schema.prisma
datasource db {
url = env("DATABASE_URL")
connection_limit = 10
}
```
## Related Documentation
### Backend Documentation
- **NAR Import Service:** `api/src/modules/map/locations/nar-import.service.ts`
- File scanning
- Streaming CSV parser
- Coordinate conversion
- Batch import
- **NAR Import Routes:** `api/src/modules/map/locations/nar-import.routes.ts`
- Dataset discovery
- Import job creation
- Progress tracking
- **Locations Service:** `api/src/modules/map/locations/locations.service.ts`
- Location CRUD
- Geocoding integration
### Frontend Documentation
- **Locations Page:** `admin/src/pages/LocationsPage.tsx`
- NAR Import tab
- Dataset selection
- Filter configuration
- Progress monitoring
### Database Documentation
- **Location Model:** `api/prisma/schema.prisma`
- NAR-specific fields
- locGuid unique constraint
- Federal district index
- **Address Model:** `api/prisma/schema.prisma`
- addrGuid unique constraint
- Location foreign key
### External Resources
- **Elections Canada NAR:** https://www.elections.ca/content.aspx?section=res&dir=cir/tech/nar&document=index&lang=e
- **EPSG:3347 Definition:** https://epsg.io/3347
- **Proj4 Documentation:** https://github.com/proj4js/proj4js
- **NAR Data Dictionary:** Elections Canada NAR Technical Documentation (PDF)