initial commit: ocm_autopull docs
83
README.md
Normal file
@ -0,0 +1,83 @@
|
|||||||
|
# ocm_autopull
|
||||||
|
|
||||||
|
The following is instructions on how to process all OCM data (City of Edmonton contact form data) into a google sheet using appscript.
|
||||||
|
|
||||||
|
Using internal resources, this process builds a single spreadsheet with all available OCM emails from a users account.
|
||||||
|
|
||||||
|
## How To
|
||||||
|
|
||||||
|
### Step 1 - Create Label
|
||||||
|
|
||||||
|
In your gmail, create a new label called `ocm_autopull`
|
||||||
|
|
||||||
|
Search your email and label all OCM emails. For example:
|
||||||
|
|
||||||
|
To: `councillors email`
|
||||||
|
|
||||||
|
Has the words: This message is intended for Councillor `councillor name` and their staff.
|
||||||
|
|
||||||
|
Select all those emails and label them with the new label you just created.
|
||||||
|
|
||||||
|
### Step 2 - Create Appscript Project
|
||||||
|
|
||||||
|
In your web browser go to
|
||||||
|
|
||||||
|
```
|
||||||
|
https://script.google.com/home/
|
||||||
|
```
|
||||||
|
|
||||||
|
And create a new project called `ocm_autopull`
|
||||||
|
|
||||||
|
### Step 3 - Add Script to ocm_autopull project
|
||||||
|
|
||||||
|
In your new project, delete the function, and replace with the code from here:
|
||||||
|
|
||||||
|
#### [code](code.md)
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Remember to click save after you have copied your code over.
|
||||||
|
|
||||||
|
#### Configure
|
||||||
|
|
||||||
|
It is possible to configure your script using the configuration settings at the top. If you already have a gmail label for ocm's, you can update the label in the configuration section of the script:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
### Step 4 - Test
|
||||||
|
|
||||||
|
After saving, you will have new functions available. The first function we want to run is the testSmallSample function. Select the function, then click run:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
This will test a few emails from your ocm_autopull label and provide a readout. It
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Review the sheet it produces and if it looks good continue to the next step.
|
||||||
|
|
||||||
|
### Step 5 - Run Script
|
||||||
|
|
||||||
|
If test passes, you are ready to run the script. Select the function called `downloadEmailsByLabel` and click run:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
**Now wait for the system to run.** It will take a few minutes, on average a minute per hundred emails.
|
||||||
|
|
||||||
|
The system also will run in 5 minute batches; this is so the script stays compliant with Google's requirements. Sometimes these batches will be in smaller increments; this is normal.
|
||||||
|
|
||||||
|
When you click `run` the script will output the location of the sheet. You can observer this sheet, and its logs, to see data being added in real time by your system.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
### Failure Case
|
||||||
|
|
||||||
|
On occasion, google appscripts can fail. If this happens, you should be able to click `run` again and the script will continue processing emails as normal.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
#### Total Breakage
|
||||||
|
|
||||||
|
If script fails entirely, delete the outputed spreadsheet, and try running it again.
|
||||||
|
|
||||||
|
|
||||||
352
code_function.md
Normal file
@ -0,0 +1,352 @@
|
|||||||
|
# Gmail to Google Sheets Export Script - Function Documentation
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This Google Apps Script automatically exports OCM (Office of City Manager) contact form submissions from Gmail to Google Sheets. The script processes emails with a specific label, parses contact form data, and creates a structured spreadsheet with real-time output and checkpoint/resume functionality.
|
||||||
|
|
||||||
|
## Configuration Constants
|
||||||
|
|
||||||
|
### Email Processing
|
||||||
|
- `LABEL_NAME`: Gmail label to process ("ocm_autopull")
|
||||||
|
- `BATCH_SIZE`: Number of emails to write before updating checkpoint (10)
|
||||||
|
- `MAX_EXECUTION_TIME`: Maximum script runtime before checkpoint (5 minutes)
|
||||||
|
|
||||||
|
### Spreadsheet Setup
|
||||||
|
- `SPREADSHEET_NAME_PREFIX`: Prefix for generated spreadsheet names
|
||||||
|
- `MAIN_SHEET_NAME`: Name of the main data sheet
|
||||||
|
- `ERROR_SHEET_NAME`: Name of the error logging sheet
|
||||||
|
- `LOG_SHEET_NAME`: Name of the processing log sheet
|
||||||
|
|
||||||
|
### Data Validation
|
||||||
|
- `REQUIRED_OCM_PHRASES`: Keywords that must be present in OCM emails
|
||||||
|
- `REPLY_INDICATORS`: Subject line patterns that indicate replies/forwards
|
||||||
|
- `SKIP_SENDER_DOMAINS`: Email domains to skip (e.g., "edmonton.ca")
|
||||||
|
- `EXCLUDE_COUNTRIES`: Countries to exclude from full address construction
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Main Functions
|
||||||
|
|
||||||
|
### `downloadEmailsByLabel()`
|
||||||
|
**Primary export function that orchestrates the entire process**
|
||||||
|
|
||||||
|
**Purpose**: Main entry point for exporting emails by label to Google Sheets with real-time output and checkpoint/resume capability.
|
||||||
|
|
||||||
|
**Process**:
|
||||||
|
1. Checks for existing checkpoint to determine if resuming or starting fresh
|
||||||
|
2. Creates or connects to spreadsheet and initializes sheets
|
||||||
|
3. Scans Gmail for all threads with the specified label
|
||||||
|
4. Processes each thread's root email sequentially
|
||||||
|
5. Saves checkpoints periodically and on timeout
|
||||||
|
6. Formats final spreadsheet and sends completion notification
|
||||||
|
|
||||||
|
**Returns**: Spreadsheet URL
|
||||||
|
|
||||||
|
**Key Features**:
|
||||||
|
- Real-time data output (writes immediately to sheets)
|
||||||
|
- Automatic checkpoint saving every 10 emails
|
||||||
|
- Time-based checkpointing (5-minute limit)
|
||||||
|
- Automatic resume scheduling on timeout
|
||||||
|
- Progress logging and error handling
|
||||||
|
- Email notification on completion
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Sheet Management Functions
|
||||||
|
|
||||||
|
### `initializeSheets(spreadsheet, isResume)`
|
||||||
|
**Creates or connects to spreadsheet sheets based on operation type**
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `spreadsheet`: The Google Spreadsheet object
|
||||||
|
- `isResume`: Boolean indicating if this is a resume operation
|
||||||
|
|
||||||
|
**Fresh Start Mode** (`isResume = false`):
|
||||||
|
- Creates new sheets with unique timestamped names
|
||||||
|
- Clears any existing content
|
||||||
|
- Sets up headers and formatting
|
||||||
|
- Freezes header rows
|
||||||
|
|
||||||
|
**Resume Mode** (`isResume = true`):
|
||||||
|
- Connects to existing sheets without clearing data
|
||||||
|
- Preserves all previously processed data
|
||||||
|
- Creates missing sheets as fallback
|
||||||
|
- Logs number of existing data rows
|
||||||
|
|
||||||
|
**Creates Three Sheets**:
|
||||||
|
1. **Main Data Sheet**: OCM contact form data
|
||||||
|
2. **Error Sheet**: Processing errors and exceptions
|
||||||
|
3. **Log Sheet**: Detailed processing activity log
|
||||||
|
|
||||||
|
### `getUniqueSheetNames()`
|
||||||
|
**Generates unique sheet names with timestamps to avoid conflicts**
|
||||||
|
|
||||||
|
**Returns**: Object containing unique names for main, error, and log sheets
|
||||||
|
|
||||||
|
### `writeRowToSheet(sheet, rowData)`
|
||||||
|
**Writes a single row of data immediately to the specified sheet**
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `sheet`: Target Google Sheet object
|
||||||
|
- `rowData`: Array of values to write
|
||||||
|
|
||||||
|
**Purpose**: Enables real-time output by writing each processed email immediately rather than batching.
|
||||||
|
|
||||||
|
### `formatSheet(sheet, totalRows)`
|
||||||
|
**Applies formatting to improve spreadsheet readability**
|
||||||
|
|
||||||
|
**Formatting Applied**:
|
||||||
|
- Auto-resizes all columns
|
||||||
|
- Sets specific column widths for OCM data fields
|
||||||
|
- Applies alternating row banding (light grey)
|
||||||
|
- Formats date columns with consistent date/time format
|
||||||
|
- Optimizes layout for OCM contact form structure
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data Processing Functions
|
||||||
|
|
||||||
|
### `getAllThreadsForLabel(label)`
|
||||||
|
**Retrieves ALL email threads for a Gmail label, handling pagination**
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `label`: Gmail Label object
|
||||||
|
|
||||||
|
**Process**:
|
||||||
|
- Handles Gmail's 500-thread pagination limit
|
||||||
|
- Retrieves threads in batches with progress logging
|
||||||
|
- Includes brief pauses between batches to avoid rate limits
|
||||||
|
|
||||||
|
**Returns**: Array of all Gmail Thread objects
|
||||||
|
|
||||||
|
### `parseOCMContactForm(message, thread, threadNumber)`
|
||||||
|
**Parses OCM contact form data from email body using regex patterns**
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `message`: Gmail Message object
|
||||||
|
- `thread`: Gmail Thread object
|
||||||
|
- `threadNumber`: Sequential thread number for logging
|
||||||
|
|
||||||
|
**Extracted Fields**:
|
||||||
|
- Confirmation Number
|
||||||
|
- Submission Date
|
||||||
|
- Personal Info: First Name, Last Name, Email, Phone, Fax
|
||||||
|
- Organization details
|
||||||
|
- Address: Street, City, Province, Country, Postal Code
|
||||||
|
- Subject and Comments
|
||||||
|
- Gmail metadata: Date, From, Subject, Thread ID, Message ID
|
||||||
|
|
||||||
|
**Data Cleaning**:
|
||||||
|
- Removes email quote markers and formatting artifacts
|
||||||
|
- Normalizes whitespace and removes colons
|
||||||
|
- Cleans email addresses from mailto links
|
||||||
|
- Handles empty organization and fax fields
|
||||||
|
- Strips footer text from comments
|
||||||
|
|
||||||
|
**Filtering**:
|
||||||
|
- Skips obvious replies/forwards based on subject indicators
|
||||||
|
- Excludes emails from specified domains (e.g., edmonton.ca)
|
||||||
|
- Processes only root emails from each thread
|
||||||
|
|
||||||
|
**Error Handling**:
|
||||||
|
- Returns error row with available metadata if parsing fails
|
||||||
|
- Logs detailed error information for troubleshooting
|
||||||
|
|
||||||
|
### `constructFullAddress(street, streetCont, city, province, country, postal)`
|
||||||
|
**Builds formatted full address from individual components**
|
||||||
|
|
||||||
|
**Address Construction Rules**:
|
||||||
|
- Validates and cleans each component
|
||||||
|
- Excludes specified countries (e.g., "Canada")
|
||||||
|
- Removes field prefixes that leaked through parsing
|
||||||
|
- Joins components with commas
|
||||||
|
- Handles empty or duplicate fields gracefully
|
||||||
|
|
||||||
|
**Returns**: Formatted address string
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Logging and Error Handling
|
||||||
|
|
||||||
|
### `logMessage(action, threadNumber, message, details)`
|
||||||
|
**Logs processing activities to the log sheet and console**
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `action`: Type of action (START, PROCESS, SUCCESS, ERROR, etc.)
|
||||||
|
- `threadNumber`: Current thread number
|
||||||
|
- `message`: Primary log message
|
||||||
|
- `details`: Additional details (optional)
|
||||||
|
|
||||||
|
**Log Entry Format**:
|
||||||
|
- Timestamp
|
||||||
|
- Thread Number
|
||||||
|
- Action Type
|
||||||
|
- Message with Details
|
||||||
|
- Status (SUCCESS/ERROR)
|
||||||
|
|
||||||
|
### `logError(threadNumber, errorType, errorDetails, emailDate, emailFrom)`
|
||||||
|
**Logs errors to the dedicated error sheet**
|
||||||
|
|
||||||
|
**Error Entry Format**:
|
||||||
|
- Thread Number
|
||||||
|
- Error Type
|
||||||
|
- Detailed Error Description
|
||||||
|
- Email Date and Sender
|
||||||
|
- Error Timestamp
|
||||||
|
|
||||||
|
**Common Error Types**:
|
||||||
|
- Parsing Error: Failed to extract form data
|
||||||
|
- Thread Processing Error: General thread handling failure
|
||||||
|
- Sheet Writing Error: Failed to write to spreadsheet
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Checkpoint and Resume System
|
||||||
|
|
||||||
|
### `saveCheckpoint(checkpointData)`
|
||||||
|
**Saves current progress to Google Apps Script Properties**
|
||||||
|
|
||||||
|
**Checkpoint Data Includes**:
|
||||||
|
- Spreadsheet ID
|
||||||
|
- Label name
|
||||||
|
- Total thread count
|
||||||
|
- Number of threads processed
|
||||||
|
- Number of emails found
|
||||||
|
- Start timestamp
|
||||||
|
|
||||||
|
### `getCheckpoint()`
|
||||||
|
**Retrieves saved checkpoint data**
|
||||||
|
|
||||||
|
**Returns**: Checkpoint object or null if no checkpoint exists
|
||||||
|
|
||||||
|
### `clearCheckpoint()`
|
||||||
|
**Removes checkpoint data when export completes**
|
||||||
|
|
||||||
|
### `scheduleResume()`
|
||||||
|
**Creates time-based trigger to automatically resume processing**
|
||||||
|
|
||||||
|
**Process**:
|
||||||
|
- Deletes any existing resume triggers
|
||||||
|
- Creates new trigger to run `resumeExport()` after specified delay
|
||||||
|
- Default delay: 1 minute
|
||||||
|
|
||||||
|
### `resumeExport()`
|
||||||
|
**Triggered function that resumes export from checkpoint**
|
||||||
|
|
||||||
|
**Process**:
|
||||||
|
- Cleans up the trigger that called it
|
||||||
|
- Calls `downloadEmailsByLabel()` to continue processing
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing and Utility Functions
|
||||||
|
|
||||||
|
### `testLabelCount()`
|
||||||
|
**Comprehensive readiness test - RUN THIS FIRST**
|
||||||
|
|
||||||
|
**Test Sequence**:
|
||||||
|
1. **Label Existence**: Verifies the specified Gmail label exists
|
||||||
|
2. **Thread Count**: Gets total number of threads with the label
|
||||||
|
3. **Message Analysis**: Analyzes sample threads for size and structure
|
||||||
|
4. **Parsing Test**: Tests email parsing on sample messages
|
||||||
|
|
||||||
|
**Output**:
|
||||||
|
- Detailed test results and statistics
|
||||||
|
- Available Gmail labels if target label not found
|
||||||
|
- Thread size analysis (min/max/average messages per thread)
|
||||||
|
- Parsing success rate on sample data
|
||||||
|
|
||||||
|
### `testSmallSample()`
|
||||||
|
**Processes 3 sample emails with real-time output - RUN THIS SECOND**
|
||||||
|
|
||||||
|
**Purpose**:
|
||||||
|
- Validates parsing logic with real data
|
||||||
|
- Tests spreadsheet creation and formatting
|
||||||
|
- Verifies real-time output functionality
|
||||||
|
- Creates test spreadsheet for review
|
||||||
|
|
||||||
|
**Process**:
|
||||||
|
- Creates test spreadsheet with all sheets
|
||||||
|
- Processes up to 3 labeled emails
|
||||||
|
- Applies full formatting
|
||||||
|
- Returns test spreadsheet URL
|
||||||
|
|
||||||
|
### `checkCheckpointStatus()`
|
||||||
|
**Displays current checkpoint information**
|
||||||
|
|
||||||
|
**Shows**:
|
||||||
|
- Spreadsheet ID and URL
|
||||||
|
- Processing progress (threads/emails)
|
||||||
|
- Start time and label name
|
||||||
|
- Resume instructions
|
||||||
|
|
||||||
|
### `clearCheckpointAndTriggers()`
|
||||||
|
**Emergency function to clear stuck checkpoints and triggers**
|
||||||
|
|
||||||
|
**Use Cases**:
|
||||||
|
- Stuck or corrupted checkpoint data
|
||||||
|
- Multiple resume triggers created
|
||||||
|
- Starting completely fresh export
|
||||||
|
|
||||||
|
### `getCurrentSpreadsheetUrl()`
|
||||||
|
**Gets URL of currently active export spreadsheet**
|
||||||
|
|
||||||
|
**Returns**: Spreadsheet URL or null if no active export
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data Structure
|
||||||
|
|
||||||
|
### Main Sheet Columns
|
||||||
|
1. **Thread Number**: Sequential processing number
|
||||||
|
2. **Confirmation Number**: OCM form confirmation ID
|
||||||
|
3. **Submission Date**: Date form was submitted
|
||||||
|
4. **First Name**: Submitter's first name
|
||||||
|
5. **Last Name**: Submitter's last name
|
||||||
|
6. **Email Address**: Contact email
|
||||||
|
7. **Phone Number**: Contact phone
|
||||||
|
8. **Fax**: Fax number (often empty)
|
||||||
|
9. **Organization**: Organization name (often empty)
|
||||||
|
10. **Street Address**: Primary street address
|
||||||
|
11. **Street Address (cont.)**: Address continuation
|
||||||
|
12. **City**: City name
|
||||||
|
13. **Province**: Province/state
|
||||||
|
14. **Country**: Country name
|
||||||
|
15. **Postal Code**: Postal/ZIP code
|
||||||
|
16. **Full Address**: Constructed complete address
|
||||||
|
17. **Subject**: Form subject line
|
||||||
|
18. **Comments**: Form comments/message
|
||||||
|
19. **Gmail Date**: Email received date
|
||||||
|
20. **Gmail From**: Email sender
|
||||||
|
21. **Gmail Subject**: Email subject line
|
||||||
|
22. **Thread ID**: Gmail thread identifier
|
||||||
|
23. **Email ID**: Gmail message identifier
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Usage Instructions
|
||||||
|
|
||||||
|
### Initial Setup
|
||||||
|
1. Update `LABEL_NAME` constant with your Gmail label
|
||||||
|
2. Run `testLabelCount()` to verify setup
|
||||||
|
3. Run `testSmallSample()` to test parsing
|
||||||
|
4. Review test results before full export
|
||||||
|
|
||||||
|
### Full Export
|
||||||
|
1. Run `downloadEmailsByLabel()` for complete export
|
||||||
|
2. Monitor console logs for progress
|
||||||
|
3. Export automatically resumes if interrupted
|
||||||
|
4. Check email for completion notification
|
||||||
|
|
||||||
|
### Troubleshooting
|
||||||
|
- Use `checkCheckpointStatus()` to view progress
|
||||||
|
- Use `clearCheckpointAndTriggers()` to reset if stuck
|
||||||
|
- Check error and log sheets for detailed information
|
||||||
|
- Verify Gmail label exists and contains expected emails
|
||||||
|
|
||||||
|
### Performance Notes
|
||||||
|
- Processes ~10-20 emails per minute depending on content
|
||||||
|
- Automatically checkpoints every 5 minutes
|
||||||
|
- Resumes automatically with 1-minute delay
|
||||||
|
- Real-time output shows immediate results
|
||||||
|
- Handles large exports (1000+ emails) through checkpointing
|
||||||
BIN
image-1.png
Normal file
|
After Width: | Height: | Size: 42 KiB |
BIN
image-2.png
Normal file
|
After Width: | Height: | Size: 68 KiB |
BIN
image-3.png
Normal file
|
After Width: | Height: | Size: 53 KiB |
BIN
image-4.png
Normal file
|
After Width: | Height: | Size: 8.5 KiB |
BIN
image-5.png
Normal file
|
After Width: | Height: | Size: 37 KiB |
BIN
image-6.png
Normal file
|
After Width: | Height: | Size: 39 KiB |
BIN
image-7.png
Normal file
|
After Width: | Height: | Size: 74 KiB |