ocm.processer/code_function.md

11 KiB

Gmail to Google Sheets Export Script - Function Documentation

Overview

This Google Apps Script automatically exports OCM (Office of City Manager) contact form submissions from Gmail to Google Sheets. The script processes emails with a specific label, parses contact form data, and creates a structured spreadsheet with real-time output and checkpoint/resume functionality.

Configuration Constants

Email Processing

  • LABEL_NAME: Gmail label to process ("ocm_autopull")
  • BATCH_SIZE: Number of emails to write before updating checkpoint (10)
  • MAX_EXECUTION_TIME: Maximum script runtime before checkpoint (5 minutes)

Spreadsheet Setup

  • SPREADSHEET_NAME_PREFIX: Prefix for generated spreadsheet names
  • MAIN_SHEET_NAME: Name of the main data sheet
  • ERROR_SHEET_NAME: Name of the error logging sheet
  • LOG_SHEET_NAME: Name of the processing log sheet

Data Validation

  • REQUIRED_OCM_PHRASES: Keywords that must be present in OCM emails
  • REPLY_INDICATORS: Subject line patterns that indicate replies/forwards
  • SKIP_SENDER_DOMAINS: Email domains to skip (e.g., "edmonton.ca")
  • EXCLUDE_COUNTRIES: Countries to exclude from full address construction

Main Functions

downloadEmailsByLabel()

Primary export function that orchestrates the entire process

Purpose: Main entry point for exporting emails by label to Google Sheets with real-time output and checkpoint/resume capability.

Process:

  1. Checks for existing checkpoint to determine if resuming or starting fresh
  2. Creates or connects to spreadsheet and initializes sheets
  3. Scans Gmail for all threads with the specified label
  4. Processes each thread's root email sequentially
  5. Saves checkpoints periodically and on timeout
  6. Formats final spreadsheet and sends completion notification

Returns: Spreadsheet URL

Key Features:

  • Real-time data output (writes immediately to sheets)
  • Automatic checkpoint saving every 10 emails
  • Time-based checkpointing (5-minute limit)
  • Automatic resume scheduling on timeout
  • Progress logging and error handling
  • Email notification on completion

Sheet Management Functions

initializeSheets(spreadsheet, isResume)

Creates or connects to spreadsheet sheets based on operation type

Parameters:

  • spreadsheet: The Google Spreadsheet object
  • isResume: Boolean indicating if this is a resume operation

Fresh Start Mode (isResume = false):

  • Creates new sheets with unique timestamped names
  • Clears any existing content
  • Sets up headers and formatting
  • Freezes header rows

Resume Mode (isResume = true):

  • Connects to existing sheets without clearing data
  • Preserves all previously processed data
  • Creates missing sheets as fallback
  • Logs number of existing data rows

Creates Three Sheets:

  1. Main Data Sheet: OCM contact form data
  2. Error Sheet: Processing errors and exceptions
  3. Log Sheet: Detailed processing activity log

getUniqueSheetNames()

Generates unique sheet names with timestamps to avoid conflicts

Returns: Object containing unique names for main, error, and log sheets

writeRowToSheet(sheet, rowData)

Writes a single row of data immediately to the specified sheet

Parameters:

  • sheet: Target Google Sheet object
  • rowData: Array of values to write

Purpose: Enables real-time output by writing each processed email immediately rather than batching.

formatSheet(sheet, totalRows)

Applies formatting to improve spreadsheet readability

Formatting Applied:

  • Auto-resizes all columns
  • Sets specific column widths for OCM data fields
  • Applies alternating row banding (light grey)
  • Formats date columns with consistent date/time format
  • Optimizes layout for OCM contact form structure

Data Processing Functions

getAllThreadsForLabel(label)

Retrieves ALL email threads for a Gmail label, handling pagination

Parameters:

  • label: Gmail Label object

Process:

  • Handles Gmail's 500-thread pagination limit
  • Retrieves threads in batches with progress logging
  • Includes brief pauses between batches to avoid rate limits

Returns: Array of all Gmail Thread objects

parseOCMContactForm(message, thread, threadNumber)

Parses OCM contact form data from email body using regex patterns

Parameters:

  • message: Gmail Message object
  • thread: Gmail Thread object
  • threadNumber: Sequential thread number for logging

Extracted Fields:

  • Confirmation Number
  • Submission Date
  • Personal Info: First Name, Last Name, Email, Phone, Fax
  • Organization details
  • Address: Street, City, Province, Country, Postal Code
  • Subject and Comments
  • Gmail metadata: Date, From, Subject, Thread ID, Message ID

Data Cleaning:

  • Removes email quote markers and formatting artifacts
  • Normalizes whitespace and removes colons
  • Cleans email addresses from mailto links
  • Handles empty organization and fax fields
  • Strips footer text from comments

Filtering:

  • Skips obvious replies/forwards based on subject indicators
  • Excludes emails from specified domains (e.g., edmonton.ca)
  • Processes only root emails from each thread

Error Handling:

  • Returns error row with available metadata if parsing fails
  • Logs detailed error information for troubleshooting

constructFullAddress(street, streetCont, city, province, country, postal)

Builds formatted full address from individual components

Address Construction Rules:

  • Validates and cleans each component
  • Excludes specified countries (e.g., "Canada")
  • Removes field prefixes that leaked through parsing
  • Joins components with commas
  • Handles empty or duplicate fields gracefully

Returns: Formatted address string


Logging and Error Handling

logMessage(action, threadNumber, message, details)

Logs processing activities to the log sheet and console

Parameters:

  • action: Type of action (START, PROCESS, SUCCESS, ERROR, etc.)
  • threadNumber: Current thread number
  • message: Primary log message
  • details: Additional details (optional)

Log Entry Format:

  • Timestamp
  • Thread Number
  • Action Type
  • Message with Details
  • Status (SUCCESS/ERROR)

logError(threadNumber, errorType, errorDetails, emailDate, emailFrom)

Logs errors to the dedicated error sheet

Error Entry Format:

  • Thread Number
  • Error Type
  • Detailed Error Description
  • Email Date and Sender
  • Error Timestamp

Common Error Types:

  • Parsing Error: Failed to extract form data
  • Thread Processing Error: General thread handling failure
  • Sheet Writing Error: Failed to write to spreadsheet

Checkpoint and Resume System

saveCheckpoint(checkpointData)

Saves current progress to Google Apps Script Properties

Checkpoint Data Includes:

  • Spreadsheet ID
  • Label name
  • Total thread count
  • Number of threads processed
  • Number of emails found
  • Start timestamp

getCheckpoint()

Retrieves saved checkpoint data

Returns: Checkpoint object or null if no checkpoint exists

clearCheckpoint()

Removes checkpoint data when export completes

scheduleResume()

Creates time-based trigger to automatically resume processing

Process:

  • Deletes any existing resume triggers
  • Creates new trigger to run resumeExport() after specified delay
  • Default delay: 1 minute

resumeExport()

Triggered function that resumes export from checkpoint

Process:

  • Cleans up the trigger that called it
  • Calls downloadEmailsByLabel() to continue processing

Testing and Utility Functions

testLabelCount()

Comprehensive readiness test - RUN THIS FIRST

Test Sequence:

  1. Label Existence: Verifies the specified Gmail label exists
  2. Thread Count: Gets total number of threads with the label
  3. Message Analysis: Analyzes sample threads for size and structure
  4. Parsing Test: Tests email parsing on sample messages

Output:

  • Detailed test results and statistics
  • Available Gmail labels if target label not found
  • Thread size analysis (min/max/average messages per thread)
  • Parsing success rate on sample data

testSmallSample()

Processes 3 sample emails with real-time output - RUN THIS SECOND

Purpose:

  • Validates parsing logic with real data
  • Tests spreadsheet creation and formatting
  • Verifies real-time output functionality
  • Creates test spreadsheet for review

Process:

  • Creates test spreadsheet with all sheets
  • Processes up to 3 labeled emails
  • Applies full formatting
  • Returns test spreadsheet URL

checkCheckpointStatus()

Displays current checkpoint information

Shows:

  • Spreadsheet ID and URL
  • Processing progress (threads/emails)
  • Start time and label name
  • Resume instructions

clearCheckpointAndTriggers()

Emergency function to clear stuck checkpoints and triggers

Use Cases:

  • Stuck or corrupted checkpoint data
  • Multiple resume triggers created
  • Starting completely fresh export

getCurrentSpreadsheetUrl()

Gets URL of currently active export spreadsheet

Returns: Spreadsheet URL or null if no active export


Data Structure

Main Sheet Columns

  1. Thread Number: Sequential processing number
  2. Confirmation Number: OCM form confirmation ID
  3. Submission Date: Date form was submitted
  4. First Name: Submitter's first name
  5. Last Name: Submitter's last name
  6. Email Address: Contact email
  7. Phone Number: Contact phone
  8. Fax: Fax number (often empty)
  9. Organization: Organization name (often empty)
  10. Street Address: Primary street address
  11. Street Address (cont.): Address continuation
  12. City: City name
  13. Province: Province/state
  14. Country: Country name
  15. Postal Code: Postal/ZIP code
  16. Full Address: Constructed complete address
  17. Subject: Form subject line
  18. Comments: Form comments/message
  19. Gmail Date: Email received date
  20. Gmail From: Email sender
  21. Gmail Subject: Email subject line
  22. Thread ID: Gmail thread identifier
  23. Email ID: Gmail message identifier

Usage Instructions

Initial Setup

  1. Update LABEL_NAME constant with your Gmail label
  2. Run testLabelCount() to verify setup
  3. Run testSmallSample() to test parsing
  4. Review test results before full export

Full Export

  1. Run downloadEmailsByLabel() for complete export
  2. Monitor console logs for progress
  3. Export automatically resumes if interrupted
  4. Check email for completion notification

Troubleshooting

  • Use checkCheckpointStatus() to view progress
  • Use clearCheckpointAndTriggers() to reset if stuck
  • Check error and log sheets for detailed information
  • Verify Gmail label exists and contains expected emails

Performance Notes

  • Processes ~10-20 emails per minute depending on content
  • Automatically checkpoints every 5 minutes
  • Resumes automatically with 1-minute delay
  • Real-time output shows immediate results
  • Handles large exports (1000+ emails) through checkpointing