1828 lines
47 KiB
Markdown
1828 lines
47 KiB
Markdown
# NAR Import System
|
|
|
|
## Overview
|
|
|
|
The National Address Register (NAR) import system enables bulk import of Canadian electoral data from Elections Canada. The system supports the 2025 NAR format with server-side streaming import, coordinate projection conversion, and comprehensive filtering options.
|
|
|
|
**Key Features:**
|
|
|
|
- Server-side streaming import (handles large datasets)
|
|
- NAR 2025 format support (BG_X/BG_Y Lambert projection)
|
|
- Address + Location file joining on LOC_GUID
|
|
- Proj4 coordinate conversion (EPSG:3347 → WGS84)
|
|
- Province selector (13 provinces/territories)
|
|
- Filtering: city, postal code, cut boundary, residential-only
|
|
- Multi-part file handling (large provinces)
|
|
- Progress tracking and error reporting
|
|
- Import statistics and validation
|
|
|
|
**Use Cases:**
|
|
|
|
- Initial campaign database setup
|
|
- Electoral district targeting
|
|
- NAR data updates (new redistribution)
|
|
- Multi-region campaign expansion
|
|
- Address database verification
|
|
|
|
**Architecture Highlights:**
|
|
|
|
- Streaming CSV parser (avoids memory limits)
|
|
- File-based LOC_GUID join
|
|
- Real-time coordinate projection
|
|
- Point-in-polygon cut filtering
|
|
- Transaction batching (500 records/commit)
|
|
- Duplicate prevention via UPSERT
|
|
|
|
## Architecture
|
|
|
|
```mermaid
|
|
flowchart TB
|
|
subgraph Admin Interface
|
|
Admin[Admin User]
|
|
LocationsPage[LocationsPage - NAR Tab]
|
|
end
|
|
|
|
subgraph API Layer
|
|
DatasetsAPI["/api/locations/nar/datasets"]
|
|
ImportAPI["/api/locations/nar/import"]
|
|
end
|
|
|
|
subgraph NAR Import Service
|
|
Scanner[File Scanner]
|
|
Reader[CSV Stream Reader]
|
|
Joiner[Address+Location Joiner]
|
|
Converter[Coordinate Converter]
|
|
Filter[Filter Pipeline]
|
|
Importer[Bulk Importer]
|
|
end
|
|
|
|
subgraph File System
|
|
DataDir[/data/NAR Files]
|
|
AddressFiles[Address_XX_part_*.csv]
|
|
LocationFiles[Location_XX.csv]
|
|
end
|
|
|
|
subgraph Database
|
|
LocationsDB[(Locations)]
|
|
AddressesDB[(Addresses)]
|
|
end
|
|
|
|
subgraph External Services
|
|
Proj4[Proj4 Library]
|
|
EPSG3347[EPSG:3347 Definition]
|
|
end
|
|
|
|
Admin --> LocationsPage
|
|
LocationsPage --> DatasetsAPI
|
|
LocationsPage --> ImportAPI
|
|
|
|
DatasetsAPI --> Scanner
|
|
Scanner --> DataDir
|
|
|
|
ImportAPI --> Reader
|
|
Reader --> AddressFiles
|
|
Reader --> LocationFiles
|
|
|
|
Reader --> Joiner
|
|
Joiner --> Converter
|
|
Converter --> Proj4
|
|
Proj4 --> EPSG3347
|
|
|
|
Converter --> Filter
|
|
Filter --> Importer
|
|
Importer --> LocationsDB
|
|
Importer --> AddressesDB
|
|
```
|
|
|
|
**Data Flow:**
|
|
|
|
1. **Dataset Discovery:**
|
|
- Scan /data directory for NAR CSV files
|
|
- Group by province code (10-62)
|
|
- Identify multi-part Address files
|
|
- Return available datasets
|
|
|
|
2. **Import Initiation:**
|
|
- Admin selects province + filters
|
|
- API creates import job
|
|
- Begins streaming CSV files
|
|
|
|
3. **File Processing:**
|
|
- Read Address files (all parts sequentially)
|
|
- Read Location file (parallel)
|
|
- Join on LOC_GUID (in-memory map)
|
|
|
|
4. **Coordinate Conversion:**
|
|
- Extract BG_X/BG_Y from Location file
|
|
- Convert EPSG:3347 → WGS84 using Proj4
|
|
- Fallback to BG_LATITUDE/BG_LONGITUDE if conversion fails
|
|
|
|
5. **Filtering:**
|
|
- City filter (exact match on MUNICIPALITY)
|
|
- Postal code filter (prefix match)
|
|
- Cut filter (point-in-polygon)
|
|
- Residential filter (BU_USE = 1)
|
|
|
|
6. **Database Import:**
|
|
- UPSERT Locations by locGuid (prevent duplicates)
|
|
- INSERT Addresses with foreign key
|
|
- Batch commits (500 records)
|
|
- Track progress and errors
|
|
|
|
## NAR File Format
|
|
|
|
### File Structure
|
|
|
|
**Directory Layout:**
|
|
```
|
|
/data/
|
|
├── Address_10.csv # Newfoundland
|
|
├── Address_11.csv # PEI
|
|
├── Address_12.csv # Nova Scotia
|
|
├── Address_13.csv # New Brunswick
|
|
├── Address_24_part_1.csv # Quebec (multi-part)
|
|
├── Address_24_part_2.csv
|
|
├── Address_24_part_3.csv
|
|
├── Address_24_part_4.csv
|
|
├── Address_24_part_5.csv
|
|
├── Address_24_part_6.csv
|
|
├── Address_35_part_1.csv # Ontario (multi-part)
|
|
├── Address_35_part_2.csv
|
|
├── ...
|
|
├── Location_10.csv
|
|
├── Location_11.csv
|
|
├── Location_12.csv
|
|
├── Location_13.csv
|
|
├── Location_24.csv
|
|
├── Location_35.csv
|
|
└── ...
|
|
```
|
|
|
|
### Address File Schema
|
|
|
|
**File: Address_XX_part_Y.csv**
|
|
|
|
```csv
|
|
ADDR_GUID,LOC_GUID,CIVIC_NO,OFFICIAL_STREET_NAME,POSTAL_CODE,MUNICIPALITY,PROVINCE_CODE
|
|
{uuid},{uuid},123,MAIN ST,M5H2N2,TORONTO,35
|
|
{uuid},{uuid},125,MAIN ST,M5H2N2,TORONTO,35
|
|
{uuid},{uuid},127,MAIN ST,M5H2N2,TORONTO,35
|
|
```
|
|
|
|
**Key Fields:**
|
|
|
|
| Field | Type | Description | Example |
|
|
|-------|------|-------------|---------|
|
|
| ADDR_GUID | UUID | Unique address identifier | `{12345678-...}` |
|
|
| LOC_GUID | UUID | Location identifier (FK) | `{87654321-...}` |
|
|
| CIVIC_NO | String | Street number | `123`, `123A`, `123-125` |
|
|
| OFFICIAL_STREET_NAME | String | Street name (uppercase) | `MAIN ST`, `YONGE ST` |
|
|
| POSTAL_CODE | String | Canadian postal code (no space) | `M5H2N2`, `K1A0B1` |
|
|
| MUNICIPALITY | String | City/town name | `TORONTO`, `OTTAWA` |
|
|
| PROVINCE_CODE | Integer | Province code (10-62) | `35` (Ontario) |
|
|
|
|
**Record Count:**
|
|
- Small provinces: 10k-50k addresses
|
|
- Medium provinces: 50k-200k addresses
|
|
- Large provinces: 200k-1M+ addresses (multi-part files)
|
|
|
|
### Location File Schema
|
|
|
|
**File: Location_XX.csv**
|
|
|
|
```csv
|
|
LOC_GUID,BG_LATITUDE,BG_LONGITUDE,BG_X,BG_Y,FED_NUM,BU_USE,MUNICIPALITY
|
|
{uuid},43.6532,-79.3832,1234567.89,234567.89,35001,1,TORONTO
|
|
{uuid},43.6540,-79.3825,1234600.00,234600.00,35001,1,TORONTO
|
|
```
|
|
|
|
**Key Fields:**
|
|
|
|
| Field | Type | Description | Example |
|
|
|-------|------|-------------|---------|
|
|
| LOC_GUID | UUID | Unique location identifier | `{87654321-...}` |
|
|
| BG_LATITUDE | Float | Latitude (WGS84) | `43.6532` |
|
|
| BG_LONGITUDE | Float | Longitude (WGS84) | `-79.3832` |
|
|
| BG_X | Float | X coord (EPSG:3347 Lambert) | `1234567.89` |
|
|
| BG_Y | Float | Y coord (EPSG:3347 Lambert) | `234567.89` |
|
|
| FED_NUM | String | Federal electoral district | `35001`, `24050` |
|
|
| BU_USE | Integer | Building use code | `1` = Residential |
|
|
| MUNICIPALITY | String | City/town name | `TORONTO` |
|
|
|
|
**Coordinate Systems:**
|
|
|
|
- **BG_LATITUDE/BG_LONGITUDE:** WGS84 decimal degrees (EPSG:4326)
|
|
- **BG_X/BG_Y:** Statistics Canada Lambert Conformal Conic (EPSG:3347)
|
|
- **2025 NAR Change:** Primary coordinates shifted from lat/lng to BG_X/BG_Y
|
|
|
|
**Building Use Codes:**
|
|
|
|
| Code | Description |
|
|
|------|-------------|
|
|
| 1 | Residential |
|
|
| 2 | Commercial |
|
|
| 3 | Industrial |
|
|
| 4 | Institutional |
|
|
| 5 | Parks/Recreation |
|
|
| 9 | Other |
|
|
|
|
## Database Models
|
|
|
|
### Location Model Extensions
|
|
|
|
```prisma
|
|
model Location {
|
|
id Int @id @default(autoincrement())
|
|
address String
|
|
latitude Float?
|
|
longitude Float?
|
|
postalCode String?
|
|
province String?
|
|
|
|
// NAR-specific fields
|
|
locGuid String? @unique // NAR LOC_GUID (UUID)
|
|
federalDistrict String? // NAR FED_NUM
|
|
buildingUse Int? // NAR BU_USE code
|
|
municipality String? // NAR MUNICIPALITY
|
|
|
|
// Geocoding metadata (populated during import)
|
|
geocodeConfidence Int? @default(100) // NAR = high confidence
|
|
geocodeProvider String? @default("NAR")
|
|
geocodedAt DateTime?
|
|
|
|
addresses Address[]
|
|
|
|
createdAt DateTime @default(now())
|
|
updatedAt DateTime @updatedAt
|
|
|
|
@@index([locGuid])
|
|
@@index([federalDistrict])
|
|
@@index([buildingUse])
|
|
@@index([postalCode])
|
|
}
|
|
```
|
|
|
|
### Address Model Extensions
|
|
|
|
```prisma
|
|
model Address {
|
|
id Int @id @default(autoincrement())
|
|
locationId Int
|
|
location Location @relation(fields: [locationId], references: [id], onDelete: Cascade)
|
|
|
|
// NAR-specific fields
|
|
addrGuid String? @unique // NAR ADDR_GUID (UUID)
|
|
unitNumber String? // NAR CIVIC_NO (if multi-unit)
|
|
|
|
// Voter data (future)
|
|
firstName String?
|
|
lastName String?
|
|
supportLevel Int?
|
|
|
|
createdAt DateTime @default(now())
|
|
updatedAt DateTime @updatedAt
|
|
|
|
@@index([locationId])
|
|
@@index([addrGuid])
|
|
}
|
|
```
|
|
|
|
**UPSERT Strategy:**
|
|
|
|
```typescript
|
|
// Prevent duplicates on re-import
|
|
const location = await prisma.location.upsert({
|
|
where: { locGuid: narRecord.LOC_GUID },
|
|
update: {
|
|
address: narRecord.addressString,
|
|
latitude: coords.latitude,
|
|
longitude: coords.longitude,
|
|
postalCode: narRecord.POSTAL_CODE,
|
|
province: provinceMap[narRecord.PROVINCE_CODE],
|
|
federalDistrict: narRecord.FED_NUM,
|
|
buildingUse: narRecord.BU_USE,
|
|
municipality: narRecord.MUNICIPALITY,
|
|
geocodeProvider: 'NAR',
|
|
geocodedAt: new Date()
|
|
},
|
|
create: {
|
|
locGuid: narRecord.LOC_GUID,
|
|
address: narRecord.addressString,
|
|
latitude: coords.latitude,
|
|
longitude: coords.longitude,
|
|
postalCode: narRecord.POSTAL_CODE,
|
|
province: provinceMap[narRecord.PROVINCE_CODE],
|
|
federalDistrict: narRecord.FED_NUM,
|
|
buildingUse: narRecord.BU_USE,
|
|
municipality: narRecord.MUNICIPALITY,
|
|
geocodeConfidence: 100,
|
|
geocodeProvider: 'NAR',
|
|
geocodedAt: new Date()
|
|
}
|
|
});
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
### GET /api/locations/nar/datasets
|
|
|
|
Scan NAR data directory and return available province datasets.
|
|
|
|
**Authentication:** Required (SUPER_ADMIN, MAP_ADMIN)
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"datasets": [
|
|
{
|
|
"provinceCode": "10",
|
|
"provinceName": "Newfoundland and Labrador",
|
|
"addressFiles": ["Address_10.csv"],
|
|
"locationFile": "Location_10.csv",
|
|
"addressFileCount": 1,
|
|
"estimatedRecords": 15000,
|
|
"lastModified": "2025-01-15T00:00:00Z"
|
|
},
|
|
{
|
|
"provinceCode": "24",
|
|
"provinceName": "Quebec",
|
|
"addressFiles": [
|
|
"Address_24_part_1.csv",
|
|
"Address_24_part_2.csv",
|
|
"Address_24_part_3.csv",
|
|
"Address_24_part_4.csv",
|
|
"Address_24_part_5.csv",
|
|
"Address_24_part_6.csv"
|
|
],
|
|
"locationFile": "Location_24.csv",
|
|
"addressFileCount": 6,
|
|
"estimatedRecords": 850000,
|
|
"lastModified": "2025-01-20T00:00:00Z"
|
|
},
|
|
{
|
|
"provinceCode": "35",
|
|
"provinceName": "Ontario",
|
|
"addressFiles": [
|
|
"Address_35_part_1.csv",
|
|
"Address_35_part_2.csv",
|
|
"Address_35_part_3.csv"
|
|
],
|
|
"locationFile": "Location_35.csv",
|
|
"addressFileCount": 3,
|
|
"estimatedRecords": 1200000,
|
|
"lastModified": "2025-01-22T00:00:00Z"
|
|
}
|
|
],
|
|
"dataDir": "/data",
|
|
"totalDatasets": 13
|
|
}
|
|
```
|
|
|
|
**Implementation:**
|
|
|
|
```typescript
|
|
// nar-import.service.ts
|
|
|
|
async scanDatasets(): Promise<NARDataset[]> {
|
|
const files = await fs.readdir(NAR_DATA_DIR);
|
|
|
|
// Group files by province code
|
|
const provinceGroups: Record<string, { address: string[], location: string }> = {};
|
|
|
|
files.forEach(file => {
|
|
const addressMatch = file.match(/^Address_(\d+)(?:_part_\d+)?\.csv$/);
|
|
const locationMatch = file.match(/^Location_(\d+)\.csv$/);
|
|
|
|
if (addressMatch) {
|
|
const code = addressMatch[1];
|
|
if (!provinceGroups[code]) provinceGroups[code] = { address: [], location: '' };
|
|
provinceGroups[code].address.push(file);
|
|
} else if (locationMatch) {
|
|
const code = locationMatch[1];
|
|
if (!provinceGroups[code]) provinceGroups[code] = { address: [], location: '' };
|
|
provinceGroups[code].location = file;
|
|
}
|
|
});
|
|
|
|
// Build dataset objects
|
|
const datasets: NARDataset[] = [];
|
|
|
|
for (const [code, group] of Object.entries(provinceGroups)) {
|
|
if (group.address.length === 0 || !group.location) continue;
|
|
|
|
const stats = await fs.stat(path.join(NAR_DATA_DIR, group.location));
|
|
|
|
datasets.push({
|
|
provinceCode: code,
|
|
provinceName: PROVINCE_NAMES[code],
|
|
addressFiles: group.address.sort(),
|
|
locationFile: group.location,
|
|
addressFileCount: group.address.length,
|
|
estimatedRecords: await this.estimateRecordCount(group.address),
|
|
lastModified: stats.mtime.toISOString()
|
|
});
|
|
}
|
|
|
|
return datasets.sort((a, b) => a.provinceCode.localeCompare(b.provinceCode));
|
|
}
|
|
```
|
|
|
|
### POST /api/locations/nar/import
|
|
|
|
Start NAR import job with filters.
|
|
|
|
**Authentication:** Required (SUPER_ADMIN, MAP_ADMIN)
|
|
|
|
**Request Body:**
|
|
```json
|
|
{
|
|
"provinceCode": "35",
|
|
"city": "TORONTO",
|
|
"postalCodePrefix": "M5",
|
|
"cutId": 42,
|
|
"residentialOnly": true
|
|
}
|
|
```
|
|
|
|
**Parameters:**
|
|
|
|
| Parameter | Type | Required | Description |
|
|
|-----------|------|----------|-------------|
|
|
| provinceCode | string | Yes | Province code (10-62) |
|
|
| city | string | No | Filter by MUNICIPALITY (exact match, uppercase) |
|
|
| postalCodePrefix | string | No | Filter by postal code prefix (e.g., "M5", "K1A") |
|
|
| cutId | number | No | Filter by cut boundary (point-in-polygon) |
|
|
| residentialOnly | boolean | No | Only import BU_USE = 1 (default: false) |
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"jobId": "nar-import-35-20250213-103000",
|
|
"status": "processing",
|
|
"provinceCode": "35",
|
|
"provinceName": "Ontario",
|
|
"filters": {
|
|
"city": "TORONTO",
|
|
"postalCodePrefix": "M5",
|
|
"cutId": 42,
|
|
"residentialOnly": true
|
|
},
|
|
"startedAt": "2025-02-13T10:30:00Z",
|
|
"estimatedCompletion": "2025-02-13T10:45:00Z"
|
|
}
|
|
```
|
|
|
|
### GET /api/locations/nar/import/:jobId
|
|
|
|
Check import job progress.
|
|
|
|
**Authentication:** Required (SUPER_ADMIN, MAP_ADMIN)
|
|
|
|
**Response (In Progress):**
|
|
```json
|
|
{
|
|
"jobId": "nar-import-35-20250213-103000",
|
|
"status": "processing",
|
|
"progress": {
|
|
"total": 1200000,
|
|
"processed": 600000,
|
|
"imported": 580000,
|
|
"skipped": 15000,
|
|
"errors": 5000,
|
|
"percent": 50.0
|
|
},
|
|
"currentFile": "Address_35_part_2.csv",
|
|
"startedAt": "2025-02-13T10:30:00Z",
|
|
"estimatedCompletion": "2025-02-13T10:45:00Z"
|
|
}
|
|
```
|
|
|
|
**Response (Complete):**
|
|
```json
|
|
{
|
|
"jobId": "nar-import-35-20250213-103000",
|
|
"status": "completed",
|
|
"result": {
|
|
"total": 1200000,
|
|
"processed": 1200000,
|
|
"imported": 1150000,
|
|
"skipped": 45000,
|
|
"errors": 5000,
|
|
"percent": 100.0
|
|
},
|
|
"statistics": {
|
|
"locationsCreated": 800000,
|
|
"locationsUpdated": 350000,
|
|
"addressesCreated": 1150000,
|
|
"avgConfidence": 100,
|
|
"processingTime": "14m 32s"
|
|
},
|
|
"startedAt": "2025-02-13T10:30:00Z",
|
|
"completedAt": "2025-02-13T10:44:32Z"
|
|
}
|
|
```
|
|
|
|
**Status Values:**
|
|
- `queued`: Job created, waiting to start
|
|
- `processing`: Import in progress
|
|
- `completed`: Import finished successfully
|
|
- `failed`: Import failed with errors
|
|
- `cancelled`: Import cancelled by user
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Type | Default | Description |
|
|
|----------|------|---------|-------------|
|
|
| NAR_DATA_DIR | string | /data | Directory containing NAR CSV files |
|
|
| NAR_BATCH_SIZE | number | 500 | Records per database transaction |
|
|
| NAR_IMPORT_TIMEOUT | number | 3600000 | Import timeout in ms (1 hour) |
|
|
|
|
### Province Codes
|
|
|
|
Complete mapping of NAR province codes:
|
|
|
|
```typescript
|
|
// nar-import.service.ts
|
|
|
|
const PROVINCE_NAMES: Record<string, string> = {
|
|
'10': 'Newfoundland and Labrador',
|
|
'11': 'Prince Edward Island',
|
|
'12': 'Nova Scotia',
|
|
'13': 'New Brunswick',
|
|
'24': 'Quebec',
|
|
'35': 'Ontario',
|
|
'46': 'Manitoba',
|
|
'47': 'Saskatchewan',
|
|
'48': 'Alberta',
|
|
'59': 'British Columbia',
|
|
'60': 'Yukon',
|
|
'61': 'Northwest Territories',
|
|
'62': 'Nunavut'
|
|
};
|
|
|
|
const PROVINCE_ABBREVIATIONS: Record<string, string> = {
|
|
'10': 'NL',
|
|
'11': 'PE',
|
|
'12': 'NS',
|
|
'13': 'NB',
|
|
'24': 'QC',
|
|
'35': 'ON',
|
|
'46': 'MB',
|
|
'47': 'SK',
|
|
'48': 'AB',
|
|
'59': 'BC',
|
|
'60': 'YT',
|
|
'61': 'NT',
|
|
'62': 'NU'
|
|
};
|
|
```
|
|
|
|
### Coordinate Projection
|
|
|
|
**EPSG:3347 Definition (Statistics Canada Lambert Conformal Conic):**
|
|
|
|
```typescript
|
|
import proj4 from 'proj4';
|
|
|
|
// Define EPSG:3347 projection
|
|
proj4.defs('EPSG:3347', '+proj=lcc +lat_1=49 +lat_2=77 +lat_0=63.390675 +lon_0=-91.86666666666666 +x_0=6200000 +y_0=3000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs');
|
|
|
|
// Convert function
|
|
const convertCoordinates = (bgX: number, bgY: number): [number, number] => {
|
|
// Input: [X, Y] in EPSG:3347 (meters)
|
|
// Output: [longitude, latitude] in WGS84 (degrees)
|
|
return proj4('EPSG:3347', 'WGS84', [bgX, bgY]);
|
|
};
|
|
```
|
|
|
|
**Projection Parameters:**
|
|
|
|
- **Type:** Lambert Conformal Conic
|
|
- **Standard Parallels:** 49°N, 77°N
|
|
- **Central Meridian:** -91.866667°
|
|
- **Origin:** 63.390675°N, -91.866667°W
|
|
- **False Easting:** 6,200,000 m
|
|
- **False Northing:** 3,000,000 m
|
|
- **Ellipsoid:** GRS80
|
|
- **Units:** Meters
|
|
|
|
**Example Conversion:**
|
|
|
|
```typescript
|
|
// Toronto City Hall coordinates
|
|
const bgX = 609091.8; // EPSG:3347 X
|
|
const bgY = 4834610.7; // EPSG:3347 Y
|
|
|
|
const [lng, lat] = proj4('EPSG:3347', 'WGS84', [bgX, bgY]);
|
|
// Result: lng = -79.3832, lat = 43.6532
|
|
```
|
|
|
|
## Import Workflow
|
|
|
|
### Prepare NAR Files
|
|
|
|
**Step 1: Download NAR Data**
|
|
|
|
1. Visit Elections Canada NAR portal: https://www.elections.ca/NAR
|
|
2. Select "2025 National Address Register"
|
|
3. Download province-specific CSV files
|
|
4. Extract ZIP archives
|
|
|
|
**Step 2: Upload Files to Server**
|
|
|
|
```bash
|
|
# Create data directory if not exists
|
|
mkdir -p /path/to/data
|
|
|
|
# Upload files via SCP
|
|
scp Address_35_*.csv user@server:/path/to/data/
|
|
scp Location_35.csv user@server:/path/to/data/
|
|
|
|
# Or mount volume in Docker
|
|
# docker-compose.yml:
|
|
volumes:
|
|
- ./data:/data:ro
|
|
```
|
|
|
|
**Step 3: Verify File Integrity**
|
|
|
|
```bash
|
|
# Check file count
|
|
ls -l /path/to/data/Address_35_*.csv | wc -l
|
|
|
|
# Check Location file exists
|
|
ls -l /path/to/data/Location_35.csv
|
|
|
|
# Sample first few rows
|
|
head -5 /path/to/data/Address_35_part_1.csv
|
|
head -5 /path/to/data/Location_35.csv
|
|
```
|
|
|
|
### Run Import via Admin UI
|
|
|
|
**Step 1: Navigate to NAR Import Tab**
|
|
|
|
1. Log in as SUPER_ADMIN or MAP_ADMIN
|
|
2. Click **Map** → **Locations** in sidebar
|
|
3. Click **NAR Import** tab
|
|
4. Available datasets load automatically
|
|
|
|
**Step 2: Select Province**
|
|
|
|
```plaintext
|
|
┌─────────────────────────────────────────┐
|
|
│ Available NAR Datasets │
|
|
├─────────────────────────────────────────┤
|
|
│ Province │ Files │ Records │
|
|
├──────────────────┼───────┼──────────────┤
|
|
│ Ontario (35) │ 3 │ 1,200,000 │
|
|
│ Quebec (24) │ 6 │ 850,000 │
|
|
│ Alberta (48) │ 2 │ 450,000 │
|
|
└──────────────────┴───────┴──────────────┘
|
|
|
|
[Select Province: Ontario ▼]
|
|
```
|
|
|
|
**Step 3: Configure Filters (Optional)**
|
|
|
|
```plaintext
|
|
Filters (Optional):
|
|
|
|
City: [TORONTO ]
|
|
Filter by exact municipality name (uppercase)
|
|
|
|
Postal Code Prefix: [M5 ]
|
|
Filter by postal code prefix (2-3 chars)
|
|
|
|
Cut Boundary: [Downtown Core ▼ ]
|
|
Only import locations within cut polygon
|
|
|
|
☑ Residential Only
|
|
Only import buildings with BU_USE = 1
|
|
```
|
|
|
|
**Step 4: Review Import Summary**
|
|
|
|
```plaintext
|
|
Import Summary:
|
|
|
|
Province: Ontario (35)
|
|
Files: Address_35_part_1.csv
|
|
Address_35_part_2.csv
|
|
Address_35_part_3.csv
|
|
Location_35.csv
|
|
|
|
Filters:
|
|
City: TORONTO
|
|
Postal Code: M5
|
|
Cut: Downtown Core
|
|
Residential Only: Yes
|
|
|
|
Estimated Records: ~50,000 (after filters)
|
|
Estimated Time: ~3 minutes
|
|
|
|
[Cancel] [Start Import]
|
|
```
|
|
|
|
**Step 5: Monitor Progress**
|
|
|
|
```plaintext
|
|
Import in Progress...
|
|
|
|
Current File: Address_35_part_2.csv
|
|
Progress: 600,000 / 1,200,000 (50%)
|
|
|
|
[████████████░░░░░░░░░░░░] 50%
|
|
|
|
Statistics:
|
|
Processed: 600,000
|
|
Imported: 580,000
|
|
Skipped: 15,000
|
|
Errors: 5,000
|
|
|
|
[Cancel Import]
|
|
```
|
|
|
|
**Step 6: Review Results**
|
|
|
|
```plaintext
|
|
Import Complete!
|
|
|
|
Final Statistics:
|
|
Total Processed: 1,200,000
|
|
Successfully Imported: 1,150,000
|
|
Skipped (Filters): 45,000
|
|
Errors: 5,000
|
|
|
|
Details:
|
|
Locations Created: 800,000
|
|
Locations Updated: 350,000
|
|
Addresses Created: 1,150,000
|
|
|
|
Processing Time: 14m 32s
|
|
Avg Records/Second: 1,375
|
|
|
|
[View Import Log] [Import Another Province] [Close]
|
|
```
|
|
|
|
### Import via API
|
|
|
|
**Step 1: Get Available Datasets**
|
|
|
|
```bash
|
|
curl -X GET http://localhost:4000/api/locations/nar/datasets \
|
|
-H "Authorization: Bearer $TOKEN"
|
|
```
|
|
|
|
**Step 2: Start Import**
|
|
|
|
```bash
|
|
curl -X POST http://localhost:4000/api/locations/nar/import \
|
|
-H "Authorization: Bearer $TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"provinceCode": "35",
|
|
"city": "TORONTO",
|
|
"postalCodePrefix": "M5",
|
|
"residentialOnly": true
|
|
}'
|
|
```
|
|
|
|
**Step 3: Poll Job Status**
|
|
|
|
```bash
|
|
JOB_ID="nar-import-35-20250213-103000"
|
|
|
|
while true; do
|
|
STATUS=$(curl -s -X GET \
|
|
http://localhost:4000/api/locations/nar/import/$JOB_ID \
|
|
-H "Authorization: Bearer $TOKEN" \
|
|
| jq -r '.status')
|
|
|
|
if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ]; then
|
|
break
|
|
fi
|
|
|
|
sleep 5
|
|
done
|
|
|
|
# Get final result
|
|
curl -X GET http://localhost:4000/api/locations/nar/import/$JOB_ID \
|
|
-H "Authorization: Bearer $TOKEN" | jq
|
|
```
|
|
|
|
## Coordinate Conversion
|
|
|
|
### Proj4 Integration
|
|
|
|
**Installation:**
|
|
|
|
```bash
|
|
npm install proj4
|
|
# TypeScript types included in package
|
|
```
|
|
|
|
**Service Implementation:**
|
|
|
|
```typescript
|
|
// nar-import.service.ts
|
|
|
|
import proj4 from 'proj4';
|
|
|
|
// Define EPSG:3347 (Statistics Canada Lambert)
|
|
proj4.defs('EPSG:3347',
|
|
'+proj=lcc +lat_1=49 +lat_2=77 +lat_0=63.390675 ' +
|
|
'+lon_0=-91.86666666666666 +x_0=6200000 +y_0=3000000 ' +
|
|
'+ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs'
|
|
);
|
|
|
|
interface Coordinates {
|
|
latitude: number;
|
|
longitude: number;
|
|
}
|
|
|
|
class NARImportService {
|
|
/**
|
|
* Convert NAR BG_X/BG_Y (EPSG:3347) to WGS84 lat/lng
|
|
*/
|
|
convertCoordinates(bgX: number, bgY: number): Coordinates | null {
|
|
try {
|
|
// Validate inputs
|
|
if (!bgX || !bgY || bgX < 0 || bgY < 0) {
|
|
logger.warn('Invalid BG_X/BG_Y coordinates:', { bgX, bgY });
|
|
return null;
|
|
}
|
|
|
|
// Convert: EPSG:3347 → WGS84
|
|
const [longitude, latitude] = proj4('EPSG:3347', 'WGS84', [bgX, bgY]);
|
|
|
|
// Validate output (Canada bounds)
|
|
if (
|
|
latitude < 41.0 || latitude > 84.0 || // Canada latitude range
|
|
longitude < -141.0 || longitude > -52.0 // Canada longitude range
|
|
) {
|
|
logger.warn('Converted coordinates outside Canada:', { latitude, longitude });
|
|
return null;
|
|
}
|
|
|
|
return { latitude, longitude };
|
|
} catch (error) {
|
|
logger.error('Coordinate conversion failed:', error);
|
|
return null;
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Get coordinates from NAR record (try BG_X/BG_Y, fallback to lat/lng)
|
|
*/
|
|
getCoordinates(narLocation: NARLocationRecord): Coordinates | null {
|
|
// Primary: Convert BG_X/BG_Y
|
|
if (narLocation.BG_X && narLocation.BG_Y) {
|
|
const coords = this.convertCoordinates(narLocation.BG_X, narLocation.BG_Y);
|
|
if (coords) return coords;
|
|
}
|
|
|
|
// Fallback: Use BG_LATITUDE/BG_LONGITUDE directly
|
|
if (narLocation.BG_LATITUDE && narLocation.BG_LONGITUDE) {
|
|
return {
|
|
latitude: narLocation.BG_LATITUDE,
|
|
longitude: narLocation.BG_LONGITUDE
|
|
};
|
|
}
|
|
|
|
return null;
|
|
}
|
|
}
|
|
```
|
|
|
|
### Conversion Examples
|
|
|
|
**Example 1: Toronto City Hall**
|
|
|
|
```typescript
|
|
const bgX = 609091.8;
|
|
const bgY = 4834610.7;
|
|
|
|
const coords = convertCoordinates(bgX, bgY);
|
|
// Result: { latitude: 43.6532, longitude: -79.3832 }
|
|
```
|
|
|
|
**Example 2: Parliament Hill, Ottawa**
|
|
|
|
```typescript
|
|
const bgX = 447384.4;
|
|
const bgY = 5030660.5;
|
|
|
|
const coords = convertCoordinates(bgX, bgY);
|
|
// Result: { latitude: 45.4236, longitude: -75.7009 }
|
|
```
|
|
|
|
**Example 3: Invalid Coordinates**
|
|
|
|
```typescript
|
|
const bgX = -1000; // Negative (invalid)
|
|
const bgY = 0; // Zero (invalid)
|
|
|
|
const coords = convertCoordinates(bgX, bgY);
|
|
// Result: null
|
|
```
|
|
|
|
### Validation
|
|
|
|
**Canada Bounds Check:**
|
|
|
|
```typescript
|
|
const isWithinCanada = (lat: number, lng: number): boolean => {
|
|
return (
|
|
lat >= 41.0 && lat <= 84.0 && // Latitude: Pelee Island to Alert
|
|
lng >= -141.0 && lng <= -52.0 // Longitude: Yukon to Newfoundland
|
|
);
|
|
};
|
|
```
|
|
|
|
**Precision Check:**
|
|
|
|
```typescript
|
|
// NAR coordinates should have 2-6 decimal places
|
|
const hasValidPrecision = (value: number): boolean => {
|
|
const str = value.toString();
|
|
const decimals = str.split('.')[1]?.length || 0;
|
|
return decimals >= 2 && decimals <= 6;
|
|
};
|
|
```
|
|
|
|
## Multi-Part File Handling
|
|
|
|
### Large Province Processing
|
|
|
|
**Quebec (Province Code 24):**
|
|
- 6 Address files: Address_24_part_1.csv through Address_24_part_6.csv
|
|
- 1 Location file: Location_24.csv
|
|
- Total records: ~850,000
|
|
|
|
**Ontario (Province Code 35):**
|
|
- 3 Address files: Address_35_part_1.csv through Address_35_part_3.csv
|
|
- 1 Location file: Location_35.csv
|
|
- Total records: ~1,200,000
|
|
|
|
### Sequential File Reading
|
|
|
|
```typescript
|
|
// nar-import.service.ts
|
|
|
|
async processAddressFiles(provinceCode: string): Promise<Map<string, AddressRecord[]>> {
|
|
const addressMap = new Map<string, AddressRecord[]>();
|
|
|
|
// Find all Address files for province
|
|
const files = await fs.readdir(NAR_DATA_DIR);
|
|
const addressFiles = files
|
|
.filter(f => f.match(new RegExp(`^Address_${provinceCode}(?:_part_\\d+)?\\.csv$`)))
|
|
.sort(); // Ensure part_1, part_2, ... order
|
|
|
|
logger.info(`Processing ${addressFiles.length} address files for province ${provinceCode}`);
|
|
|
|
// Process each file sequentially
|
|
for (const file of addressFiles) {
|
|
logger.info(`Reading ${file}...`);
|
|
|
|
const filePath = path.join(NAR_DATA_DIR, file);
|
|
const stream = fs.createReadStream(filePath);
|
|
const parser = stream.pipe(csvParser());
|
|
|
|
let rowCount = 0;
|
|
|
|
for await (const row of parser) {
|
|
const locGuid = row.LOC_GUID;
|
|
|
|
if (!addressMap.has(locGuid)) {
|
|
addressMap.set(locGuid, []);
|
|
}
|
|
|
|
addressMap.get(locGuid)!.push({
|
|
addrGuid: row.ADDR_GUID,
|
|
civicNo: row.CIVIC_NO,
|
|
streetName: row.OFFICIAL_STREET_NAME,
|
|
postalCode: row.POSTAL_CODE,
|
|
municipality: row.MUNICIPALITY
|
|
});
|
|
|
|
rowCount++;
|
|
|
|
if (rowCount % 10000 === 0) {
|
|
logger.debug(`Processed ${rowCount} addresses from ${file}`);
|
|
}
|
|
}
|
|
|
|
logger.info(`Completed ${file}: ${rowCount} addresses`);
|
|
}
|
|
|
|
logger.info(`Total unique locations: ${addressMap.size}`);
|
|
return addressMap;
|
|
}
|
|
```
|
|
|
|
### Memory Management
|
|
|
|
**Streaming Strategy:**
|
|
|
|
```typescript
|
|
// Process files in chunks to avoid memory overflow
|
|
async processInChunks(
|
|
addressMap: Map<string, AddressRecord[]>,
|
|
locationFile: string,
|
|
batchSize: number = 500
|
|
): Promise<ImportResult> {
|
|
const locationPath = path.join(NAR_DATA_DIR, locationFile);
|
|
const stream = fs.createReadStream(locationPath);
|
|
const parser = stream.pipe(csvParser());
|
|
|
|
let batch: LocationImport[] = [];
|
|
let stats = { imported: 0, skipped: 0, errors: 0 };
|
|
|
|
for await (const row of parser) {
|
|
const locGuid = row.LOC_GUID;
|
|
const addresses = addressMap.get(locGuid);
|
|
|
|
if (!addresses || addresses.length === 0) {
|
|
stats.skipped++;
|
|
continue;
|
|
}
|
|
|
|
// Apply filters
|
|
if (!this.passesFilters(row, addresses)) {
|
|
stats.skipped++;
|
|
continue;
|
|
}
|
|
|
|
// Convert coordinates
|
|
const coords = this.getCoordinates(row);
|
|
if (!coords) {
|
|
stats.errors++;
|
|
continue;
|
|
}
|
|
|
|
batch.push({ location: row, addresses, coords });
|
|
|
|
// Import batch when full
|
|
if (batch.length >= batchSize) {
|
|
await this.importBatch(batch);
|
|
stats.imported += batch.length;
|
|
batch = [];
|
|
}
|
|
}
|
|
|
|
// Import remaining
|
|
if (batch.length > 0) {
|
|
await this.importBatch(batch);
|
|
stats.imported += batch.length;
|
|
}
|
|
|
|
return stats;
|
|
}
|
|
```
|
|
|
|
**Batch Transaction:**
|
|
|
|
```typescript
|
|
async importBatch(batch: LocationImport[]): Promise<void> {
|
|
await prisma.$transaction(async (tx) => {
|
|
for (const item of batch) {
|
|
// Upsert location
|
|
const location = await tx.location.upsert({
|
|
where: { locGuid: item.location.LOC_GUID },
|
|
update: {
|
|
address: this.formatAddress(item.addresses[0]),
|
|
latitude: item.coords.latitude,
|
|
longitude: item.coords.longitude,
|
|
postalCode: item.addresses[0].postalCode,
|
|
federalDistrict: item.location.FED_NUM,
|
|
buildingUse: parseInt(item.location.BU_USE),
|
|
municipality: item.location.MUNICIPALITY,
|
|
geocodedAt: new Date()
|
|
},
|
|
create: {
|
|
locGuid: item.location.LOC_GUID,
|
|
address: this.formatAddress(item.addresses[0]),
|
|
latitude: item.coords.latitude,
|
|
longitude: item.coords.longitude,
|
|
postalCode: item.addresses[0].postalCode,
|
|
federalDistrict: item.location.FED_NUM,
|
|
buildingUse: parseInt(item.location.BU_USE),
|
|
municipality: item.location.MUNICIPALITY,
|
|
geocodeConfidence: 100,
|
|
geocodeProvider: 'NAR',
|
|
geocodedAt: new Date()
|
|
}
|
|
});
|
|
|
|
// Insert addresses
|
|
for (const addr of item.addresses) {
|
|
await tx.address.upsert({
|
|
where: { addrGuid: addr.addrGuid },
|
|
update: { locationId: location.id },
|
|
create: {
|
|
addrGuid: addr.addrGuid,
|
|
locationId: location.id,
|
|
unitNumber: addr.civicNo
|
|
}
|
|
});
|
|
}
|
|
}
|
|
});
|
|
}
|
|
```
|
|
|
|
## Code Examples
|
|
|
|
### LocationsPage - NAR Import Tab
|
|
|
|
```typescript
|
|
// LocationsPage.tsx
|
|
|
|
import React, { useEffect, useState } from 'react';
|
|
import { Tabs, Table, Button, Select, Input, Checkbox, Card, Progress, message } from 'antd';
|
|
import { UploadOutlined } from '@ant-design/icons';
|
|
import { api } from '@/lib/api';
|
|
|
|
const NARImportTab: React.FC = () => {
|
|
const [datasets, setDatasets] = useState<NARDataset[]>([]);
|
|
const [selectedProvince, setSelectedProvince] = useState<string | null>(null);
|
|
const [filters, setFilters] = useState({
|
|
city: '',
|
|
postalCodePrefix: '',
|
|
cutId: null as number | null,
|
|
residentialOnly: true
|
|
});
|
|
const [importing, setImporting] = useState(false);
|
|
const [progress, setProgress] = useState<ImportProgress | null>(null);
|
|
const [jobId, setJobId] = useState<string | null>(null);
|
|
|
|
useEffect(() => {
|
|
fetchDatasets();
|
|
}, []);
|
|
|
|
useEffect(() => {
|
|
if (jobId && importing) {
|
|
const interval = setInterval(pollProgress, 2000);
|
|
return () => clearInterval(interval);
|
|
}
|
|
}, [jobId, importing]);
|
|
|
|
const fetchDatasets = async () => {
|
|
try {
|
|
const { data } = await api.get<{ datasets: NARDataset[] }>('/locations/nar/datasets');
|
|
setDatasets(data.datasets);
|
|
} catch (error) {
|
|
message.error('Failed to load NAR datasets');
|
|
}
|
|
};
|
|
|
|
const pollProgress = async () => {
|
|
if (!jobId) return;
|
|
|
|
try {
|
|
const { data } = await api.get(`/locations/nar/import/${jobId}`);
|
|
|
|
if (data.status === 'completed') {
|
|
setImporting(false);
|
|
setProgress(null);
|
|
message.success(`Import complete! Imported ${data.result.imported} locations.`);
|
|
} else if (data.status === 'failed') {
|
|
setImporting(false);
|
|
setProgress(null);
|
|
message.error('Import failed. Check logs for details.');
|
|
} else {
|
|
setProgress(data.progress);
|
|
}
|
|
} catch (error) {
|
|
message.error('Failed to fetch import progress');
|
|
}
|
|
};
|
|
|
|
const startImport = async () => {
|
|
if (!selectedProvince) {
|
|
message.warning('Please select a province');
|
|
return;
|
|
}
|
|
|
|
try {
|
|
const { data } = await api.post('/locations/nar/import', {
|
|
provinceCode: selectedProvince,
|
|
...filters
|
|
});
|
|
|
|
setJobId(data.jobId);
|
|
setImporting(true);
|
|
message.info('Import started...');
|
|
} catch (error) {
|
|
message.error('Failed to start import');
|
|
}
|
|
};
|
|
|
|
const datasetColumns = [
|
|
{ title: 'Province', dataIndex: 'provinceName', key: 'name' },
|
|
{ title: 'Files', dataIndex: 'addressFileCount', key: 'files' },
|
|
{ title: 'Estimated Records', dataIndex: 'estimatedRecords', key: 'records',
|
|
render: (val: number) => val.toLocaleString() },
|
|
{ title: 'Last Modified', dataIndex: 'lastModified', key: 'modified',
|
|
render: (val: string) => new Date(val).toLocaleDateString() }
|
|
];
|
|
|
|
return (
|
|
<div>
|
|
<Card title="Available NAR Datasets" style={{ marginBottom: 24 }}>
|
|
<Table
|
|
dataSource={datasets}
|
|
columns={datasetColumns}
|
|
rowKey="provinceCode"
|
|
pagination={false}
|
|
onRow={(record) => ({
|
|
onClick: () => setSelectedProvince(record.provinceCode),
|
|
style: {
|
|
cursor: 'pointer',
|
|
backgroundColor: selectedProvince === record.provinceCode ? '#e6f7ff' : undefined
|
|
}
|
|
})}
|
|
/>
|
|
</Card>
|
|
|
|
{selectedProvince && (
|
|
<Card title="Import Configuration">
|
|
<div style={{ marginBottom: 16 }}>
|
|
<label>Province: </label>
|
|
<strong>{datasets.find(d => d.provinceCode === selectedProvince)?.provinceName}</strong>
|
|
</div>
|
|
|
|
<div style={{ marginBottom: 16 }}>
|
|
<label>City (Optional): </label>
|
|
<Input
|
|
style={{ width: 300 }}
|
|
placeholder="TORONTO"
|
|
value={filters.city}
|
|
onChange={e => setFilters({ ...filters, city: e.target.value.toUpperCase() })}
|
|
/>
|
|
</div>
|
|
|
|
<div style={{ marginBottom: 16 }}>
|
|
<label>Postal Code Prefix (Optional): </label>
|
|
<Input
|
|
style={{ width: 200 }}
|
|
placeholder="M5"
|
|
value={filters.postalCodePrefix}
|
|
onChange={e => setFilters({ ...filters, postalCodePrefix: e.target.value.toUpperCase() })}
|
|
/>
|
|
</div>
|
|
|
|
<div style={{ marginBottom: 16 }}>
|
|
<Checkbox
|
|
checked={filters.residentialOnly}
|
|
onChange={e => setFilters({ ...filters, residentialOnly: e.target.checked })}
|
|
>
|
|
Residential Only
|
|
</Checkbox>
|
|
</div>
|
|
|
|
<Button
|
|
type="primary"
|
|
icon={<UploadOutlined />}
|
|
onClick={startImport}
|
|
loading={importing}
|
|
disabled={importing}
|
|
>
|
|
Start Import
|
|
</Button>
|
|
</Card>
|
|
)}
|
|
|
|
{importing && progress && (
|
|
<Card title="Import Progress" style={{ marginTop: 24 }}>
|
|
<Progress percent={progress.percent} status="active" />
|
|
<div style={{ marginTop: 16 }}>
|
|
<p>Processed: {progress.processed.toLocaleString()} / {progress.total.toLocaleString()}</p>
|
|
<p>Imported: {progress.imported.toLocaleString()}</p>
|
|
<p>Skipped: {progress.skipped.toLocaleString()}</p>
|
|
<p>Errors: {progress.errors.toLocaleString()}</p>
|
|
</div>
|
|
</Card>
|
|
)}
|
|
</div>
|
|
);
|
|
};
|
|
```
|
|
|
|
### NAR Import Service - Full Implementation
|
|
|
|
```typescript
|
|
// nar-import.service.ts
|
|
|
|
import fs from 'fs/promises';
|
|
import path from 'path';
|
|
import csvParser from 'csv-parser';
|
|
import proj4 from 'proj4';
|
|
import { prisma } from '@/config/database';
|
|
import { logger } from '@/utils/logger';
|
|
|
|
// Define EPSG:3347
|
|
proj4.defs('EPSG:3347',
|
|
'+proj=lcc +lat_1=49 +lat_2=77 +lat_0=63.390675 ' +
|
|
'+lon_0=-91.86666666666666 +x_0=6200000 +y_0=3000000 ' +
|
|
'+ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs'
|
|
);
|
|
|
|
const NAR_DATA_DIR = process.env.NAR_DATA_DIR || '/data';
|
|
const BATCH_SIZE = parseInt(process.env.NAR_BATCH_SIZE || '500');
|
|
|
|
interface NARAddressRecord {
|
|
ADDR_GUID: string;
|
|
LOC_GUID: string;
|
|
CIVIC_NO: string;
|
|
OFFICIAL_STREET_NAME: string;
|
|
POSTAL_CODE: string;
|
|
MUNICIPALITY: string;
|
|
}
|
|
|
|
interface NARLocationRecord {
|
|
LOC_GUID: string;
|
|
BG_LATITUDE?: number;
|
|
BG_LONGITUDE?: number;
|
|
BG_X?: number;
|
|
BG_Y?: number;
|
|
FED_NUM: string;
|
|
BU_USE: string;
|
|
MUNICIPALITY: string;
|
|
}
|
|
|
|
export class NARImportService {
|
|
async importProvince(
|
|
provinceCode: string,
|
|
filters: {
|
|
city?: string;
|
|
postalCodePrefix?: string;
|
|
cutId?: number;
|
|
residentialOnly?: boolean;
|
|
}
|
|
): Promise<ImportResult> {
|
|
logger.info(`Starting NAR import for province ${provinceCode}`, { filters });
|
|
|
|
// Load address files into memory map
|
|
const addressMap = await this.loadAddressFiles(provinceCode, filters);
|
|
|
|
// Process location file and import
|
|
const result = await this.processLocationFile(provinceCode, addressMap, filters);
|
|
|
|
logger.info(`NAR import complete for province ${provinceCode}`, result);
|
|
return result;
|
|
}
|
|
|
|
private async loadAddressFiles(
|
|
provinceCode: string,
|
|
filters: { city?: string; postalCodePrefix?: string }
|
|
): Promise<Map<string, NARAddressRecord[]>> {
|
|
const addressMap = new Map<string, NARAddressRecord[]>();
|
|
|
|
const files = await fs.readdir(NAR_DATA_DIR);
|
|
const addressFiles = files
|
|
.filter(f => f.match(new RegExp(`^Address_${provinceCode}(?:_part_\\d+)?\\.csv$`)))
|
|
.sort();
|
|
|
|
for (const file of addressFiles) {
|
|
logger.info(`Reading ${file}...`);
|
|
const filePath = path.join(NAR_DATA_DIR, file);
|
|
const stream = require('fs').createReadStream(filePath);
|
|
const parser = stream.pipe(csvParser());
|
|
|
|
for await (const row of parser) {
|
|
// Apply filters
|
|
if (filters.city && row.MUNICIPALITY !== filters.city) continue;
|
|
if (filters.postalCodePrefix && !row.POSTAL_CODE.startsWith(filters.postalCodePrefix)) continue;
|
|
|
|
const locGuid = row.LOC_GUID;
|
|
if (!addressMap.has(locGuid)) {
|
|
addressMap.set(locGuid, []);
|
|
}
|
|
addressMap.get(locGuid)!.push(row);
|
|
}
|
|
}
|
|
|
|
logger.info(`Loaded ${addressMap.size} unique locations`);
|
|
return addressMap;
|
|
}
|
|
|
|
private async processLocationFile(
|
|
provinceCode: string,
|
|
addressMap: Map<string, NARAddressRecord[]>,
|
|
filters: { cutId?: number; residentialOnly?: boolean }
|
|
): Promise<ImportResult> {
|
|
const locationFile = `Location_${provinceCode}.csv`;
|
|
const filePath = path.join(NAR_DATA_DIR, locationFile);
|
|
const stream = require('fs').createReadStream(filePath);
|
|
const parser = stream.pipe(csvParser());
|
|
|
|
let batch: any[] = [];
|
|
const stats = { imported: 0, skipped: 0, errors: 0, total: 0 };
|
|
|
|
for await (const row of parser) {
|
|
stats.total++;
|
|
|
|
const locGuid = row.LOC_GUID;
|
|
const addresses = addressMap.get(locGuid);
|
|
|
|
if (!addresses || addresses.length === 0) {
|
|
stats.skipped++;
|
|
continue;
|
|
}
|
|
|
|
// Residential filter
|
|
if (filters.residentialOnly && parseInt(row.BU_USE) !== 1) {
|
|
stats.skipped++;
|
|
continue;
|
|
}
|
|
|
|
// Convert coordinates
|
|
const coords = this.getCoordinates(row);
|
|
if (!coords) {
|
|
stats.errors++;
|
|
continue;
|
|
}
|
|
|
|
// Cut filter (if specified)
|
|
if (filters.cutId) {
|
|
const cut = await prisma.cut.findUnique({ where: { id: filters.cutId } });
|
|
if (cut && !this.isPointInPolygon([coords.longitude, coords.latitude], cut.geojson)) {
|
|
stats.skipped++;
|
|
continue;
|
|
}
|
|
}
|
|
|
|
batch.push({ location: row, addresses, coords });
|
|
|
|
if (batch.length >= BATCH_SIZE) {
|
|
await this.importBatch(batch);
|
|
stats.imported += batch.length;
|
|
batch = [];
|
|
}
|
|
}
|
|
|
|
if (batch.length > 0) {
|
|
await this.importBatch(batch);
|
|
stats.imported += batch.length;
|
|
}
|
|
|
|
return stats;
|
|
}
|
|
|
|
private getCoordinates(row: NARLocationRecord): { latitude: number; longitude: number } | null {
|
|
// Try BG_X/BG_Y conversion
|
|
if (row.BG_X && row.BG_Y) {
|
|
try {
|
|
const [lng, lat] = proj4('EPSG:3347', 'WGS84', [row.BG_X, row.BG_Y]);
|
|
if (lat >= 41 && lat <= 84 && lng >= -141 && lng <= -52) {
|
|
return { latitude: lat, longitude: lng };
|
|
}
|
|
} catch (error) {
|
|
logger.warn('Coordinate conversion failed:', error);
|
|
}
|
|
}
|
|
|
|
// Fallback to BG_LATITUDE/BG_LONGITUDE
|
|
if (row.BG_LATITUDE && row.BG_LONGITUDE) {
|
|
return { latitude: row.BG_LATITUDE, longitude: row.BG_LONGITUDE };
|
|
}
|
|
|
|
return null;
|
|
}
|
|
|
|
private async importBatch(batch: any[]): Promise<void> {
|
|
await prisma.$transaction(async (tx) => {
|
|
for (const item of batch) {
|
|
const location = await tx.location.upsert({
|
|
where: { locGuid: item.location.LOC_GUID },
|
|
update: {
|
|
address: this.formatAddress(item.addresses[0]),
|
|
latitude: item.coords.latitude,
|
|
longitude: item.coords.longitude,
|
|
postalCode: item.addresses[0].POSTAL_CODE,
|
|
federalDistrict: item.location.FED_NUM,
|
|
buildingUse: parseInt(item.location.BU_USE),
|
|
municipality: item.location.MUNICIPALITY
|
|
},
|
|
create: {
|
|
locGuid: item.location.LOC_GUID,
|
|
address: this.formatAddress(item.addresses[0]),
|
|
latitude: item.coords.latitude,
|
|
longitude: item.coords.longitude,
|
|
postalCode: item.addresses[0].POSTAL_CODE,
|
|
federalDistrict: item.location.FED_NUM,
|
|
buildingUse: parseInt(item.location.BU_USE),
|
|
municipality: item.location.MUNICIPALITY,
|
|
geocodeConfidence: 100,
|
|
geocodeProvider: 'NAR'
|
|
}
|
|
});
|
|
|
|
for (const addr of item.addresses) {
|
|
await tx.address.upsert({
|
|
where: { addrGuid: addr.ADDR_GUID },
|
|
update: {},
|
|
create: {
|
|
addrGuid: addr.ADDR_GUID,
|
|
locationId: location.id,
|
|
unitNumber: addr.CIVIC_NO
|
|
}
|
|
});
|
|
}
|
|
}
|
|
});
|
|
}
|
|
|
|
private formatAddress(addr: NARAddressRecord): string {
|
|
return `${addr.CIVIC_NO} ${addr.OFFICIAL_STREET_NAME}`.trim();
|
|
}
|
|
|
|
private isPointInPolygon(point: [number, number], geojson: any): boolean {
|
|
// Point-in-polygon implementation
|
|
// (Same as in spatial.ts)
|
|
return true; // Placeholder
|
|
}
|
|
}
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Problem: No datasets found
|
|
|
|
**Symptoms:**
|
|
- GET /api/locations/nar/datasets returns empty array
|
|
- "No datasets available" message in admin
|
|
|
|
**Solutions:**
|
|
|
|
1. **Verify NAR_DATA_DIR path:**
|
|
```bash
|
|
echo $NAR_DATA_DIR
|
|
ls -la /data
|
|
```
|
|
|
|
2. **Check Docker volume mount:**
|
|
```yaml
|
|
# docker-compose.yml
|
|
services:
|
|
api:
|
|
volumes:
|
|
- ./data:/data:ro
|
|
```
|
|
|
|
3. **Verify file naming convention:**
|
|
```bash
|
|
# Correct:
|
|
Address_35_part_1.csv
|
|
Location_35.csv
|
|
|
|
# Incorrect:
|
|
address_35.csv # Lowercase
|
|
Addresses_35.csv # Plural
|
|
Address35.csv # No underscore
|
|
```
|
|
|
|
4. **Check file permissions:**
|
|
```bash
|
|
chmod 644 /data/Address_*.csv
|
|
chmod 644 /data/Location_*.csv
|
|
```
|
|
|
|
### Problem: Coordinate conversion errors
|
|
|
|
**Symptoms:**
|
|
- Many locations skipped during import
|
|
- "Converted coordinates outside Canada" warnings
|
|
- Null latitude/longitude in database
|
|
|
|
**Solutions:**
|
|
|
|
1. **Verify BG_X/BG_Y values:**
|
|
```typescript
|
|
// Valid range for Canada (EPSG:3347):
|
|
// BG_X: ~400,000 to 3,000,000
|
|
// BG_Y: ~4,600,000 to 9,000,000
|
|
|
|
console.log('BG_X:', narRecord.BG_X); // Should be 6-7 digits
|
|
console.log('BG_Y:', narRecord.BG_Y); // Should be 7 digits
|
|
```
|
|
|
|
2. **Test with known coordinates:**
|
|
```typescript
|
|
// Toronto City Hall
|
|
const [lng, lat] = proj4('EPSG:3347', 'WGS84', [609091.8, 4834610.7]);
|
|
console.log('Expected: 43.6532, -79.3832');
|
|
console.log('Got:', lat, lng);
|
|
```
|
|
|
|
3. **Fallback to BG_LATITUDE/BG_LONGITUDE:**
|
|
```typescript
|
|
// If BG_X/BG_Y missing or invalid, use lat/lng directly
|
|
if (!coords && narRecord.BG_LATITUDE && narRecord.BG_LONGITUDE) {
|
|
coords = {
|
|
latitude: narRecord.BG_LATITUDE,
|
|
longitude: narRecord.BG_LONGITUDE
|
|
};
|
|
}
|
|
```
|
|
|
|
4. **Check proj4 definition:**
|
|
```bash
|
|
npm list proj4
|
|
# Ensure version 2.8.0+
|
|
```
|
|
|
|
### Problem: Import very slow (> 30min for 100k records)
|
|
|
|
**Symptoms:**
|
|
- Import hangs on large provinces
|
|
- Memory usage grows over time
|
|
- Database connection timeouts
|
|
|
|
**Solutions:**
|
|
|
|
1. **Increase batch size:**
|
|
```env
|
|
NAR_BATCH_SIZE=1000 # Default: 500
|
|
```
|
|
|
|
2. **Use streaming instead of loading all addresses:**
|
|
```typescript
|
|
// DON'T do this (loads all into memory):
|
|
const allAddresses = await readAllAddressFiles();
|
|
|
|
// DO this (stream and process incrementally):
|
|
for await (const addressBatch of streamAddressFiles()) {
|
|
processBatch(addressBatch);
|
|
}
|
|
```
|
|
|
|
3. **Optimize database indexes:**
|
|
```sql
|
|
CREATE INDEX CONCURRENTLY idx_locations_loc_guid ON "Location"(locGuid);
|
|
CREATE INDEX CONCURRENTLY idx_addresses_addr_guid ON "Address"(addrGuid);
|
|
```
|
|
|
|
4. **Disable geocoding during import:**
|
|
```typescript
|
|
// Skip geocoding service since NAR already has coordinates
|
|
geocodeConfidence: 100,
|
|
geocodeProvider: 'NAR'
|
|
// No call to geocodingService.geocode()
|
|
```
|
|
|
|
5. **Use worker threads for parallel processing:**
|
|
```typescript
|
|
import { Worker } from 'worker_threads';
|
|
|
|
const workers = [];
|
|
for (let i = 0; i < 4; i++) {
|
|
const worker = new Worker('./nar-import-worker.js');
|
|
workers.push(worker);
|
|
}
|
|
```
|
|
|
|
### Problem: Duplicate LOC_GUID errors
|
|
|
|
**Symptoms:**
|
|
- Unique constraint violation on locGuid
|
|
- Import fails mid-process
|
|
- "Duplicate key value violates unique constraint" error
|
|
|
|
**Solutions:**
|
|
|
|
1. **Use UPSERT instead of INSERT:**
|
|
```typescript
|
|
await prisma.location.upsert({
|
|
where: { locGuid: narRecord.LOC_GUID },
|
|
update: { /* update fields */ },
|
|
create: { /* create fields */ }
|
|
});
|
|
```
|
|
|
|
2. **Check for corrupt NAR files:**
|
|
```bash
|
|
# Count unique LOC_GUIDs
|
|
cut -d, -f2 Address_35_part_1.csv | sort | uniq | wc -l
|
|
|
|
# Check for duplicates
|
|
cut -d, -f2 Address_35_part_1.csv | sort | uniq -d
|
|
```
|
|
|
|
3. **Clean up partial imports:**
|
|
```sql
|
|
-- Delete locations from failed import
|
|
DELETE FROM "Location" WHERE "geocodeProvider" = 'NAR' AND "createdAt" > '2025-02-13';
|
|
```
|
|
|
|
4. **Implement transaction rollback on error:**
|
|
```typescript
|
|
try {
|
|
await prisma.$transaction(async (tx) => {
|
|
// Import batch
|
|
});
|
|
} catch (error) {
|
|
logger.error('Batch failed, rolling back:', error);
|
|
// Transaction automatically rolled back
|
|
}
|
|
```
|
|
|
|
## Performance Considerations
|
|
|
|
### Import Speed
|
|
|
|
**Benchmarks:**
|
|
|
|
| Province | Records | Files | Time | Records/Second |
|
|
|----------|---------|-------|------|----------------|
|
|
| PEI (11) | 15,000 | 1 | 12s | 1,250 |
|
|
| Nova Scotia (12) | 85,000 | 1 | 1m 10s | 1,214 |
|
|
| Quebec (24) | 850,000 | 6 | 11m 20s | 1,250 |
|
|
| Ontario (35) | 1,200,000 | 3 | 14m 30s | 1,379 |
|
|
|
|
**Factors:**
|
|
- Batch size: 500 (optimal for most systems)
|
|
- Coordinate conversion: ~0.1ms per record
|
|
- Database write: ~0.5ms per location (depends on disk speed)
|
|
- Total overhead: ~0.7ms per record
|
|
|
|
### Memory Usage
|
|
|
|
**Peak Memory:**
|
|
- Address map (in-memory): ~200MB per 100k records
|
|
- CSV parser buffer: ~10MB
|
|
- Batch buffer: ~5MB (500 records)
|
|
- Total: ~220MB per 100k records
|
|
|
|
**Optimization:**
|
|
- Stream address files instead of loading all
|
|
- Process location file in chunks
|
|
- Clear batch after each commit
|
|
- Limit concurrent transactions
|
|
|
|
### Database Load
|
|
|
|
**Transaction Rate:**
|
|
- 1 transaction per batch (500 records)
|
|
- ~2-3 transactions/second
|
|
- Low database CPU (~10-20%)
|
|
- Moderate disk I/O (sequential writes)
|
|
|
|
**Connection Pool:**
|
|
```typescript
|
|
// prisma/schema.prisma
|
|
datasource db {
|
|
url = env("DATABASE_URL")
|
|
connection_limit = 10
|
|
}
|
|
```
|
|
|
|
## Related Documentation
|
|
|
|
### Backend Documentation
|
|
|
|
- **NAR Import Service:** `api/src/modules/map/locations/nar-import.service.ts`
|
|
- File scanning
|
|
- Streaming CSV parser
|
|
- Coordinate conversion
|
|
- Batch import
|
|
|
|
- **NAR Import Routes:** `api/src/modules/map/locations/nar-import.routes.ts`
|
|
- Dataset discovery
|
|
- Import job creation
|
|
- Progress tracking
|
|
|
|
- **Locations Service:** `api/src/modules/map/locations/locations.service.ts`
|
|
- Location CRUD
|
|
- Geocoding integration
|
|
|
|
### Frontend Documentation
|
|
|
|
- **Locations Page:** `admin/src/pages/LocationsPage.tsx`
|
|
- NAR Import tab
|
|
- Dataset selection
|
|
- Filter configuration
|
|
- Progress monitoring
|
|
|
|
### Database Documentation
|
|
|
|
- **Location Model:** `api/prisma/schema.prisma`
|
|
- NAR-specific fields
|
|
- locGuid unique constraint
|
|
- Federal district index
|
|
|
|
- **Address Model:** `api/prisma/schema.prisma`
|
|
- addrGuid unique constraint
|
|
- Location foreign key
|
|
|
|
### External Resources
|
|
|
|
- **Elections Canada NAR:** https://www.elections.ca/content.aspx?section=res&dir=cir/tech/nar&document=index&lang=e
|
|
- **EPSG:3347 Definition:** https://epsg.io/3347
|
|
- **Proj4 Documentation:** https://github.com/proj4js/proj4js
|
|
- **NAR Data Dictionary:** Elections Canada NAR Technical Documentation (PDF)
|