# Gmail to Google Sheets Export Script - Function Documentation ## Overview This Google Apps Script automatically exports OCM (Office of City Manager) contact form submissions from Gmail to Google Sheets. The script processes emails with a specific label, parses contact form data, and creates a structured spreadsheet with real-time output and checkpoint/resume functionality. ## Configuration Constants ### Email Processing - `LABEL_NAME`: Gmail label to process ("ocm_autopull") - `BATCH_SIZE`: Number of emails to write before updating checkpoint (10) - `MAX_EXECUTION_TIME`: Maximum script runtime before checkpoint (5 minutes) ### Spreadsheet Setup - `SPREADSHEET_NAME_PREFIX`: Prefix for generated spreadsheet names - `MAIN_SHEET_NAME`: Name of the main data sheet - `ERROR_SHEET_NAME`: Name of the error logging sheet - `LOG_SHEET_NAME`: Name of the processing log sheet ### Data Validation - `REQUIRED_OCM_PHRASES`: Keywords that must be present in OCM emails - `REPLY_INDICATORS`: Subject line patterns that indicate replies/forwards - `SKIP_SENDER_DOMAINS`: Email domains to skip (e.g., "edmonton.ca") - `EXCLUDE_COUNTRIES`: Countries to exclude from full address construction --- ## Main Functions ### `downloadEmailsByLabel()` **Primary export function that orchestrates the entire process** **Purpose**: Main entry point for exporting emails by label to Google Sheets with real-time output and checkpoint/resume capability. **Process**: 1. Checks for existing checkpoint to determine if resuming or starting fresh 2. Creates or connects to spreadsheet and initializes sheets 3. Scans Gmail for all threads with the specified label 4. Processes each thread's root email sequentially 5. Saves checkpoints periodically and on timeout 6. Formats final spreadsheet and sends completion notification **Returns**: Spreadsheet URL **Key Features**: - Real-time data output (writes immediately to sheets) - Automatic checkpoint saving every 10 emails - Time-based checkpointing (5-minute limit) - Automatic resume scheduling on timeout - Progress logging and error handling - Email notification on completion --- ## Sheet Management Functions ### `initializeSheets(spreadsheet, isResume)` **Creates or connects to spreadsheet sheets based on operation type** **Parameters**: - `spreadsheet`: The Google Spreadsheet object - `isResume`: Boolean indicating if this is a resume operation **Fresh Start Mode** (`isResume = false`): - Creates new sheets with unique timestamped names - Clears any existing content - Sets up headers and formatting - Freezes header rows **Resume Mode** (`isResume = true`): - Connects to existing sheets without clearing data - Preserves all previously processed data - Creates missing sheets as fallback - Logs number of existing data rows **Creates Three Sheets**: 1. **Main Data Sheet**: OCM contact form data 2. **Error Sheet**: Processing errors and exceptions 3. **Log Sheet**: Detailed processing activity log ### `getUniqueSheetNames()` **Generates unique sheet names with timestamps to avoid conflicts** **Returns**: Object containing unique names for main, error, and log sheets ### `writeRowToSheet(sheet, rowData)` **Writes a single row of data immediately to the specified sheet** **Parameters**: - `sheet`: Target Google Sheet object - `rowData`: Array of values to write **Purpose**: Enables real-time output by writing each processed email immediately rather than batching. ### `formatSheet(sheet, totalRows)` **Applies formatting to improve spreadsheet readability** **Formatting Applied**: - Auto-resizes all columns - Sets specific column widths for OCM data fields - Applies alternating row banding (light grey) - Formats date columns with consistent date/time format - Optimizes layout for OCM contact form structure --- ## Data Processing Functions ### `getAllThreadsForLabel(label)` **Retrieves ALL email threads for a Gmail label, handling pagination** **Parameters**: - `label`: Gmail Label object **Process**: - Handles Gmail's 500-thread pagination limit - Retrieves threads in batches with progress logging - Includes brief pauses between batches to avoid rate limits **Returns**: Array of all Gmail Thread objects ### `parseOCMContactForm(message, thread, threadNumber)` **Parses OCM contact form data from email body using regex patterns** **Parameters**: - `message`: Gmail Message object - `thread`: Gmail Thread object - `threadNumber`: Sequential thread number for logging **Extracted Fields**: - Confirmation Number - Submission Date - Personal Info: First Name, Last Name, Email, Phone, Fax - Organization details - Address: Street, City, Province, Country, Postal Code - Subject and Comments - Gmail metadata: Date, From, Subject, Thread ID, Message ID **Data Cleaning**: - Removes email quote markers and formatting artifacts - Normalizes whitespace and removes colons - Cleans email addresses from mailto links - Handles empty organization and fax fields - Strips footer text from comments **Filtering**: - Skips obvious replies/forwards based on subject indicators - Excludes emails from specified domains (e.g., edmonton.ca) - Processes only root emails from each thread **Error Handling**: - Returns error row with available metadata if parsing fails - Logs detailed error information for troubleshooting ### `constructFullAddress(street, streetCont, city, province, country, postal)` **Builds formatted full address from individual components** **Address Construction Rules**: - Validates and cleans each component - Excludes specified countries (e.g., "Canada") - Removes field prefixes that leaked through parsing - Joins components with commas - Handles empty or duplicate fields gracefully **Returns**: Formatted address string --- ## Logging and Error Handling ### `logMessage(action, threadNumber, message, details)` **Logs processing activities to the log sheet and console** **Parameters**: - `action`: Type of action (START, PROCESS, SUCCESS, ERROR, etc.) - `threadNumber`: Current thread number - `message`: Primary log message - `details`: Additional details (optional) **Log Entry Format**: - Timestamp - Thread Number - Action Type - Message with Details - Status (SUCCESS/ERROR) ### `logError(threadNumber, errorType, errorDetails, emailDate, emailFrom)` **Logs errors to the dedicated error sheet** **Error Entry Format**: - Thread Number - Error Type - Detailed Error Description - Email Date and Sender - Error Timestamp **Common Error Types**: - Parsing Error: Failed to extract form data - Thread Processing Error: General thread handling failure - Sheet Writing Error: Failed to write to spreadsheet --- ## Checkpoint and Resume System ### `saveCheckpoint(checkpointData)` **Saves current progress to Google Apps Script Properties** **Checkpoint Data Includes**: - Spreadsheet ID - Label name - Total thread count - Number of threads processed - Number of emails found - Start timestamp ### `getCheckpoint()` **Retrieves saved checkpoint data** **Returns**: Checkpoint object or null if no checkpoint exists ### `clearCheckpoint()` **Removes checkpoint data when export completes** ### `scheduleResume()` **Creates time-based trigger to automatically resume processing** **Process**: - Deletes any existing resume triggers - Creates new trigger to run `resumeExport()` after specified delay - Default delay: 1 minute ### `resumeExport()` **Triggered function that resumes export from checkpoint** **Process**: - Cleans up the trigger that called it - Calls `downloadEmailsByLabel()` to continue processing --- ## Testing and Utility Functions ### `testLabelCount()` **Comprehensive readiness test - RUN THIS FIRST** **Test Sequence**: 1. **Label Existence**: Verifies the specified Gmail label exists 2. **Thread Count**: Gets total number of threads with the label 3. **Message Analysis**: Analyzes sample threads for size and structure 4. **Parsing Test**: Tests email parsing on sample messages **Output**: - Detailed test results and statistics - Available Gmail labels if target label not found - Thread size analysis (min/max/average messages per thread) - Parsing success rate on sample data ### `testSmallSample()` **Processes 3 sample emails with real-time output - RUN THIS SECOND** **Purpose**: - Validates parsing logic with real data - Tests spreadsheet creation and formatting - Verifies real-time output functionality - Creates test spreadsheet for review **Process**: - Creates test spreadsheet with all sheets - Processes up to 3 labeled emails - Applies full formatting - Returns test spreadsheet URL ### `checkCheckpointStatus()` **Displays current checkpoint information** **Shows**: - Spreadsheet ID and URL - Processing progress (threads/emails) - Start time and label name - Resume instructions ### `clearCheckpointAndTriggers()` **Emergency function to clear stuck checkpoints and triggers** **Use Cases**: - Stuck or corrupted checkpoint data - Multiple resume triggers created - Starting completely fresh export ### `getCurrentSpreadsheetUrl()` **Gets URL of currently active export spreadsheet** **Returns**: Spreadsheet URL or null if no active export --- ## Data Structure ### Main Sheet Columns 1. **Thread Number**: Sequential processing number 2. **Confirmation Number**: OCM form confirmation ID 3. **Submission Date**: Date form was submitted 4. **First Name**: Submitter's first name 5. **Last Name**: Submitter's last name 6. **Email Address**: Contact email 7. **Phone Number**: Contact phone 8. **Fax**: Fax number (often empty) 9. **Organization**: Organization name (often empty) 10. **Street Address**: Primary street address 11. **Street Address (cont.)**: Address continuation 12. **City**: City name 13. **Province**: Province/state 14. **Country**: Country name 15. **Postal Code**: Postal/ZIP code 16. **Full Address**: Constructed complete address 17. **Subject**: Form subject line 18. **Comments**: Form comments/message 19. **Gmail Date**: Email received date 20. **Gmail From**: Email sender 21. **Gmail Subject**: Email subject line 22. **Thread ID**: Gmail thread identifier 23. **Email ID**: Gmail message identifier --- ## Usage Instructions ### Initial Setup 1. Update `LABEL_NAME` constant with your Gmail label 2. Run `testLabelCount()` to verify setup 3. Run `testSmallSample()` to test parsing 4. Review test results before full export ### Full Export 1. Run `downloadEmailsByLabel()` for complete export 2. Monitor console logs for progress 3. Export automatically resumes if interrupted 4. Check email for completion notification ### Troubleshooting - Use `checkCheckpointStatus()` to view progress - Use `clearCheckpointAndTriggers()` to reset if stuck - Check error and log sheets for detailed information - Verify Gmail label exists and contains expected emails ### Performance Notes - Processes ~10-20 emails per minute depending on content - Automatically checkpoints every 5 minutes - Resumes automatically with 1-minute delay - Real-time output shows immediate results - Handles large exports (1000+ emails) through checkpointing