initial commit: ocm_autopull docs
83
README.md
Normal file
@ -0,0 +1,83 @@
|
||||
# ocm_autopull
|
||||
|
||||
The following is instructions on how to process all OCM data (City of Edmonton contact form data) into a google sheet using appscript.
|
||||
|
||||
Using internal resources, this process builds a single spreadsheet with all available OCM emails from a users account.
|
||||
|
||||
## How To
|
||||
|
||||
### Step 1 - Create Label
|
||||
|
||||
In your gmail, create a new label called `ocm_autopull`
|
||||
|
||||
Search your email and label all OCM emails. For example:
|
||||
|
||||
To: `councillors email`
|
||||
|
||||
Has the words: This message is intended for Councillor `councillor name` and their staff.
|
||||
|
||||
Select all those emails and label them with the new label you just created.
|
||||
|
||||
### Step 2 - Create Appscript Project
|
||||
|
||||
In your web browser go to
|
||||
|
||||
```
|
||||
https://script.google.com/home/
|
||||
```
|
||||
|
||||
And create a new project called `ocm_autopull`
|
||||
|
||||
### Step 3 - Add Script to ocm_autopull project
|
||||
|
||||
In your new project, delete the function, and replace with the code from here:
|
||||
|
||||
#### [code](code.md)
|
||||
|
||||

|
||||
|
||||
Remember to click save after you have copied your code over.
|
||||
|
||||
#### Configure
|
||||
|
||||
It is possible to configure your script using the configuration settings at the top. If you already have a gmail label for ocm's, you can update the label in the configuration section of the script:
|
||||
|
||||

|
||||
|
||||
### Step 4 - Test
|
||||
|
||||
After saving, you will have new functions available. The first function we want to run is the testSmallSample function. Select the function, then click run:
|
||||
|
||||

|
||||
|
||||
This will test a few emails from your ocm_autopull label and provide a readout. It
|
||||
|
||||

|
||||
|
||||
Review the sheet it produces and if it looks good continue to the next step.
|
||||
|
||||
### Step 5 - Run Script
|
||||
|
||||
If test passes, you are ready to run the script. Select the function called `downloadEmailsByLabel` and click run:
|
||||
|
||||

|
||||
|
||||
**Now wait for the system to run.** It will take a few minutes, on average a minute per hundred emails.
|
||||
|
||||
The system also will run in 5 minute batches; this is so the script stays compliant with Google's requirements. Sometimes these batches will be in smaller increments; this is normal.
|
||||
|
||||
When you click `run` the script will output the location of the sheet. You can observer this sheet, and its logs, to see data being added in real time by your system.
|
||||
|
||||

|
||||
|
||||
### Failure Case
|
||||
|
||||
On occasion, google appscripts can fail. If this happens, you should be able to click `run` again and the script will continue processing emails as normal.
|
||||
|
||||

|
||||
|
||||
#### Total Breakage
|
||||
|
||||
If script fails entirely, delete the outputed spreadsheet, and try running it again.
|
||||
|
||||
|
||||
352
code_function.md
Normal file
@ -0,0 +1,352 @@
|
||||
# Gmail to Google Sheets Export Script - Function Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
This Google Apps Script automatically exports OCM (Office of City Manager) contact form submissions from Gmail to Google Sheets. The script processes emails with a specific label, parses contact form data, and creates a structured spreadsheet with real-time output and checkpoint/resume functionality.
|
||||
|
||||
## Configuration Constants
|
||||
|
||||
### Email Processing
|
||||
- `LABEL_NAME`: Gmail label to process ("ocm_autopull")
|
||||
- `BATCH_SIZE`: Number of emails to write before updating checkpoint (10)
|
||||
- `MAX_EXECUTION_TIME`: Maximum script runtime before checkpoint (5 minutes)
|
||||
|
||||
### Spreadsheet Setup
|
||||
- `SPREADSHEET_NAME_PREFIX`: Prefix for generated spreadsheet names
|
||||
- `MAIN_SHEET_NAME`: Name of the main data sheet
|
||||
- `ERROR_SHEET_NAME`: Name of the error logging sheet
|
||||
- `LOG_SHEET_NAME`: Name of the processing log sheet
|
||||
|
||||
### Data Validation
|
||||
- `REQUIRED_OCM_PHRASES`: Keywords that must be present in OCM emails
|
||||
- `REPLY_INDICATORS`: Subject line patterns that indicate replies/forwards
|
||||
- `SKIP_SENDER_DOMAINS`: Email domains to skip (e.g., "edmonton.ca")
|
||||
- `EXCLUDE_COUNTRIES`: Countries to exclude from full address construction
|
||||
|
||||
---
|
||||
|
||||
## Main Functions
|
||||
|
||||
### `downloadEmailsByLabel()`
|
||||
**Primary export function that orchestrates the entire process**
|
||||
|
||||
**Purpose**: Main entry point for exporting emails by label to Google Sheets with real-time output and checkpoint/resume capability.
|
||||
|
||||
**Process**:
|
||||
1. Checks for existing checkpoint to determine if resuming or starting fresh
|
||||
2. Creates or connects to spreadsheet and initializes sheets
|
||||
3. Scans Gmail for all threads with the specified label
|
||||
4. Processes each thread's root email sequentially
|
||||
5. Saves checkpoints periodically and on timeout
|
||||
6. Formats final spreadsheet and sends completion notification
|
||||
|
||||
**Returns**: Spreadsheet URL
|
||||
|
||||
**Key Features**:
|
||||
- Real-time data output (writes immediately to sheets)
|
||||
- Automatic checkpoint saving every 10 emails
|
||||
- Time-based checkpointing (5-minute limit)
|
||||
- Automatic resume scheduling on timeout
|
||||
- Progress logging and error handling
|
||||
- Email notification on completion
|
||||
|
||||
---
|
||||
|
||||
## Sheet Management Functions
|
||||
|
||||
### `initializeSheets(spreadsheet, isResume)`
|
||||
**Creates or connects to spreadsheet sheets based on operation type**
|
||||
|
||||
**Parameters**:
|
||||
- `spreadsheet`: The Google Spreadsheet object
|
||||
- `isResume`: Boolean indicating if this is a resume operation
|
||||
|
||||
**Fresh Start Mode** (`isResume = false`):
|
||||
- Creates new sheets with unique timestamped names
|
||||
- Clears any existing content
|
||||
- Sets up headers and formatting
|
||||
- Freezes header rows
|
||||
|
||||
**Resume Mode** (`isResume = true`):
|
||||
- Connects to existing sheets without clearing data
|
||||
- Preserves all previously processed data
|
||||
- Creates missing sheets as fallback
|
||||
- Logs number of existing data rows
|
||||
|
||||
**Creates Three Sheets**:
|
||||
1. **Main Data Sheet**: OCM contact form data
|
||||
2. **Error Sheet**: Processing errors and exceptions
|
||||
3. **Log Sheet**: Detailed processing activity log
|
||||
|
||||
### `getUniqueSheetNames()`
|
||||
**Generates unique sheet names with timestamps to avoid conflicts**
|
||||
|
||||
**Returns**: Object containing unique names for main, error, and log sheets
|
||||
|
||||
### `writeRowToSheet(sheet, rowData)`
|
||||
**Writes a single row of data immediately to the specified sheet**
|
||||
|
||||
**Parameters**:
|
||||
- `sheet`: Target Google Sheet object
|
||||
- `rowData`: Array of values to write
|
||||
|
||||
**Purpose**: Enables real-time output by writing each processed email immediately rather than batching.
|
||||
|
||||
### `formatSheet(sheet, totalRows)`
|
||||
**Applies formatting to improve spreadsheet readability**
|
||||
|
||||
**Formatting Applied**:
|
||||
- Auto-resizes all columns
|
||||
- Sets specific column widths for OCM data fields
|
||||
- Applies alternating row banding (light grey)
|
||||
- Formats date columns with consistent date/time format
|
||||
- Optimizes layout for OCM contact form structure
|
||||
|
||||
---
|
||||
|
||||
## Data Processing Functions
|
||||
|
||||
### `getAllThreadsForLabel(label)`
|
||||
**Retrieves ALL email threads for a Gmail label, handling pagination**
|
||||
|
||||
**Parameters**:
|
||||
- `label`: Gmail Label object
|
||||
|
||||
**Process**:
|
||||
- Handles Gmail's 500-thread pagination limit
|
||||
- Retrieves threads in batches with progress logging
|
||||
- Includes brief pauses between batches to avoid rate limits
|
||||
|
||||
**Returns**: Array of all Gmail Thread objects
|
||||
|
||||
### `parseOCMContactForm(message, thread, threadNumber)`
|
||||
**Parses OCM contact form data from email body using regex patterns**
|
||||
|
||||
**Parameters**:
|
||||
- `message`: Gmail Message object
|
||||
- `thread`: Gmail Thread object
|
||||
- `threadNumber`: Sequential thread number for logging
|
||||
|
||||
**Extracted Fields**:
|
||||
- Confirmation Number
|
||||
- Submission Date
|
||||
- Personal Info: First Name, Last Name, Email, Phone, Fax
|
||||
- Organization details
|
||||
- Address: Street, City, Province, Country, Postal Code
|
||||
- Subject and Comments
|
||||
- Gmail metadata: Date, From, Subject, Thread ID, Message ID
|
||||
|
||||
**Data Cleaning**:
|
||||
- Removes email quote markers and formatting artifacts
|
||||
- Normalizes whitespace and removes colons
|
||||
- Cleans email addresses from mailto links
|
||||
- Handles empty organization and fax fields
|
||||
- Strips footer text from comments
|
||||
|
||||
**Filtering**:
|
||||
- Skips obvious replies/forwards based on subject indicators
|
||||
- Excludes emails from specified domains (e.g., edmonton.ca)
|
||||
- Processes only root emails from each thread
|
||||
|
||||
**Error Handling**:
|
||||
- Returns error row with available metadata if parsing fails
|
||||
- Logs detailed error information for troubleshooting
|
||||
|
||||
### `constructFullAddress(street, streetCont, city, province, country, postal)`
|
||||
**Builds formatted full address from individual components**
|
||||
|
||||
**Address Construction Rules**:
|
||||
- Validates and cleans each component
|
||||
- Excludes specified countries (e.g., "Canada")
|
||||
- Removes field prefixes that leaked through parsing
|
||||
- Joins components with commas
|
||||
- Handles empty or duplicate fields gracefully
|
||||
|
||||
**Returns**: Formatted address string
|
||||
|
||||
---
|
||||
|
||||
## Logging and Error Handling
|
||||
|
||||
### `logMessage(action, threadNumber, message, details)`
|
||||
**Logs processing activities to the log sheet and console**
|
||||
|
||||
**Parameters**:
|
||||
- `action`: Type of action (START, PROCESS, SUCCESS, ERROR, etc.)
|
||||
- `threadNumber`: Current thread number
|
||||
- `message`: Primary log message
|
||||
- `details`: Additional details (optional)
|
||||
|
||||
**Log Entry Format**:
|
||||
- Timestamp
|
||||
- Thread Number
|
||||
- Action Type
|
||||
- Message with Details
|
||||
- Status (SUCCESS/ERROR)
|
||||
|
||||
### `logError(threadNumber, errorType, errorDetails, emailDate, emailFrom)`
|
||||
**Logs errors to the dedicated error sheet**
|
||||
|
||||
**Error Entry Format**:
|
||||
- Thread Number
|
||||
- Error Type
|
||||
- Detailed Error Description
|
||||
- Email Date and Sender
|
||||
- Error Timestamp
|
||||
|
||||
**Common Error Types**:
|
||||
- Parsing Error: Failed to extract form data
|
||||
- Thread Processing Error: General thread handling failure
|
||||
- Sheet Writing Error: Failed to write to spreadsheet
|
||||
|
||||
---
|
||||
|
||||
## Checkpoint and Resume System
|
||||
|
||||
### `saveCheckpoint(checkpointData)`
|
||||
**Saves current progress to Google Apps Script Properties**
|
||||
|
||||
**Checkpoint Data Includes**:
|
||||
- Spreadsheet ID
|
||||
- Label name
|
||||
- Total thread count
|
||||
- Number of threads processed
|
||||
- Number of emails found
|
||||
- Start timestamp
|
||||
|
||||
### `getCheckpoint()`
|
||||
**Retrieves saved checkpoint data**
|
||||
|
||||
**Returns**: Checkpoint object or null if no checkpoint exists
|
||||
|
||||
### `clearCheckpoint()`
|
||||
**Removes checkpoint data when export completes**
|
||||
|
||||
### `scheduleResume()`
|
||||
**Creates time-based trigger to automatically resume processing**
|
||||
|
||||
**Process**:
|
||||
- Deletes any existing resume triggers
|
||||
- Creates new trigger to run `resumeExport()` after specified delay
|
||||
- Default delay: 1 minute
|
||||
|
||||
### `resumeExport()`
|
||||
**Triggered function that resumes export from checkpoint**
|
||||
|
||||
**Process**:
|
||||
- Cleans up the trigger that called it
|
||||
- Calls `downloadEmailsByLabel()` to continue processing
|
||||
|
||||
---
|
||||
|
||||
## Testing and Utility Functions
|
||||
|
||||
### `testLabelCount()`
|
||||
**Comprehensive readiness test - RUN THIS FIRST**
|
||||
|
||||
**Test Sequence**:
|
||||
1. **Label Existence**: Verifies the specified Gmail label exists
|
||||
2. **Thread Count**: Gets total number of threads with the label
|
||||
3. **Message Analysis**: Analyzes sample threads for size and structure
|
||||
4. **Parsing Test**: Tests email parsing on sample messages
|
||||
|
||||
**Output**:
|
||||
- Detailed test results and statistics
|
||||
- Available Gmail labels if target label not found
|
||||
- Thread size analysis (min/max/average messages per thread)
|
||||
- Parsing success rate on sample data
|
||||
|
||||
### `testSmallSample()`
|
||||
**Processes 3 sample emails with real-time output - RUN THIS SECOND**
|
||||
|
||||
**Purpose**:
|
||||
- Validates parsing logic with real data
|
||||
- Tests spreadsheet creation and formatting
|
||||
- Verifies real-time output functionality
|
||||
- Creates test spreadsheet for review
|
||||
|
||||
**Process**:
|
||||
- Creates test spreadsheet with all sheets
|
||||
- Processes up to 3 labeled emails
|
||||
- Applies full formatting
|
||||
- Returns test spreadsheet URL
|
||||
|
||||
### `checkCheckpointStatus()`
|
||||
**Displays current checkpoint information**
|
||||
|
||||
**Shows**:
|
||||
- Spreadsheet ID and URL
|
||||
- Processing progress (threads/emails)
|
||||
- Start time and label name
|
||||
- Resume instructions
|
||||
|
||||
### `clearCheckpointAndTriggers()`
|
||||
**Emergency function to clear stuck checkpoints and triggers**
|
||||
|
||||
**Use Cases**:
|
||||
- Stuck or corrupted checkpoint data
|
||||
- Multiple resume triggers created
|
||||
- Starting completely fresh export
|
||||
|
||||
### `getCurrentSpreadsheetUrl()`
|
||||
**Gets URL of currently active export spreadsheet**
|
||||
|
||||
**Returns**: Spreadsheet URL or null if no active export
|
||||
|
||||
---
|
||||
|
||||
## Data Structure
|
||||
|
||||
### Main Sheet Columns
|
||||
1. **Thread Number**: Sequential processing number
|
||||
2. **Confirmation Number**: OCM form confirmation ID
|
||||
3. **Submission Date**: Date form was submitted
|
||||
4. **First Name**: Submitter's first name
|
||||
5. **Last Name**: Submitter's last name
|
||||
6. **Email Address**: Contact email
|
||||
7. **Phone Number**: Contact phone
|
||||
8. **Fax**: Fax number (often empty)
|
||||
9. **Organization**: Organization name (often empty)
|
||||
10. **Street Address**: Primary street address
|
||||
11. **Street Address (cont.)**: Address continuation
|
||||
12. **City**: City name
|
||||
13. **Province**: Province/state
|
||||
14. **Country**: Country name
|
||||
15. **Postal Code**: Postal/ZIP code
|
||||
16. **Full Address**: Constructed complete address
|
||||
17. **Subject**: Form subject line
|
||||
18. **Comments**: Form comments/message
|
||||
19. **Gmail Date**: Email received date
|
||||
20. **Gmail From**: Email sender
|
||||
21. **Gmail Subject**: Email subject line
|
||||
22. **Thread ID**: Gmail thread identifier
|
||||
23. **Email ID**: Gmail message identifier
|
||||
|
||||
---
|
||||
|
||||
## Usage Instructions
|
||||
|
||||
### Initial Setup
|
||||
1. Update `LABEL_NAME` constant with your Gmail label
|
||||
2. Run `testLabelCount()` to verify setup
|
||||
3. Run `testSmallSample()` to test parsing
|
||||
4. Review test results before full export
|
||||
|
||||
### Full Export
|
||||
1. Run `downloadEmailsByLabel()` for complete export
|
||||
2. Monitor console logs for progress
|
||||
3. Export automatically resumes if interrupted
|
||||
4. Check email for completion notification
|
||||
|
||||
### Troubleshooting
|
||||
- Use `checkCheckpointStatus()` to view progress
|
||||
- Use `clearCheckpointAndTriggers()` to reset if stuck
|
||||
- Check error and log sheets for detailed information
|
||||
- Verify Gmail label exists and contains expected emails
|
||||
|
||||
### Performance Notes
|
||||
- Processes ~10-20 emails per minute depending on content
|
||||
- Automatically checkpoints every 5 minutes
|
||||
- Resumes automatically with 1-minute delay
|
||||
- Real-time output shows immediate results
|
||||
- Handles large exports (1000+ emails) through checkpointing
|
||||
BIN
image-1.png
Normal file
|
After Width: | Height: | Size: 42 KiB |
BIN
image-2.png
Normal file
|
After Width: | Height: | Size: 68 KiB |
BIN
image-3.png
Normal file
|
After Width: | Height: | Size: 53 KiB |
BIN
image-4.png
Normal file
|
After Width: | Height: | Size: 8.5 KiB |
BIN
image-5.png
Normal file
|
After Width: | Height: | Size: 37 KiB |
BIN
image-6.png
Normal file
|
After Width: | Height: | Size: 39 KiB |
BIN
image-7.png
Normal file
|
After Width: | Height: | Size: 74 KiB |