~/projects/outbound-report

outbound_report_

Multi-carrier daily report automation  ·  Python  ·  POP3 → XLSX → SMTP

ACTIVE 3 runs/day v1.3
01Overview
02Pipeline Flow
03Code & Highlights

// system_metrics

5
sources_merged
carriers / day
3
scheduled_runs
17:00 · 18:00 · 19:00
0
manual_steps
fully automated
.xlsb
output_format
live pivot table
45m
time_saved
per day

// dev_timeline

BRD REVIEW
6 hours
DEVELOPMENT
6 days
TESTING / QA
1 week
GO-LIVE
Active

// data_sources → output

CAIOD Report.xlsxreceived
CBLTL Daily Exception List.xlsxreceived
CBMR Daily Exception List.xlsxreceived
CCDaily Manifest.xlsxreceived
CDDaily Manifest Report.csvreceived
TRANSFORM
+ MERGE
Consolidated_Daily_Manifest.xlsb
schema19-col standardized
pivotauto-refreshed via xlwings
format.xlsb · freeze panes · filters
emailpivot HTML embedded in body
storagedate-stamped folder / day

// technical_highlights

SCHEDULE
3-Run Daily Dispatch
One bat file per run passes a mode flag via sys.argv. Threshold shifts per run (≥1700 / ≥1800), no external state.
DETECTION
Stateless Update Signal
_HHMM suffix embedded at download time. Re-runs check the filename timestamp. No lock files, no DB.
RESILIENCE
Fault-Tolerant Sources
Each source wrapped in its own try/except. Missing files are skipped; RuntimeError only when dfs is empty. Alert and report are independent.
PERF
POP3 Early Termination
Scans newest-first. Exits when all 5 targets are matched or the first email older than today is reached.
OUTPUT
Template-Driven Excel
Data written at A2 into a pre-built template. xlwings refreshes the pivot before .xlsb export, preserving all formatting.
EMAIL
Pivot in Email Body
Post-refresh pivot rendered as an inline HTML table in the email body. Recipients see the summary without opening the attachment.

// dependencies

pythonruntime
pandastransform
xlwingsexcel automation
openpyxlexcel utils
poplibpop3 inbound
smtplibsmtp outbound
pytztz handling
python-dotenvcredentials
claude-codeAI dev

// execution_flow

Scheduled Run 17:00 · 18:00 · 19:00 Connect to POP3 Mailbox newest-first · stop at yesterday Match Email Against Targets subject · sender · date All 5 files received? No (partial) Yes Skip Missing Sources continue with received Send Failure Alert missing + received detail Transform Each Source reorder · split · format · fill Consolidate + Refresh Pivot write to template → .xlsb .xlsb already exists? No Yes Send Report first run New files? 2nd:≥17:00 · 3rd:≥18:00 Yes No Updated re-send report No Update skip notify

click to zoom in

// error_handling

POP3 Connection Failure
Returns DownloadResult(success=False). Pipeline continues with whatever files were already downloaded.
Missing Source File
Source-level FileNotFoundError → skip & continue. Only raises RuntimeError if all sources are missing.
Email Send Failure
Missing credentials or SMTP error → logged with full traceback, returns False. No crash.
Bad Email Date Header
Unparseable Date header → silently skipped. Scan continues to the next message.
POP3 Quit Error
Close failure logged as warning. Connection cleared in finally.
Partial Download
Returns success=False with partial received_times. Failure alert lists what arrived and what didn't.

// code_highlights · main.example.py

In-Place Address Column Expansion provider_d_report()
# Split one address string into 5 structured fields
# and splice them back at the original column position
d_col = col_dict["D"]
split_df = (
    df[d_col].str.split(",", expand=True)
             .apply(lambda x: x.str.strip())
)
split_df.columns = [
    "ConsigneeAddress", "City",
    "State", "Country", "Zip"
]
d_idx = list(df.columns).index(d_col)
df.drop(columns=[d_col], inplace=True)
for i, col in enumerate(split_df.columns):
    df.insert(d_idx + i, col, split_df[col])

# Null-fill keeps pivot clean — no (Blank) label rows
df[col_dict["K"]] = df[col_dict["K"]].fillna(
    "SHIPMENT EN-ROUTE-TO-DEST"
)
Fault-Tolerant Source Dispatch Table combine_excel()
# Each source is isolated — one failure never blocks others
dfs = []
for label, loader in [
    ("Provider A", lambda: pd.read_excel(
        find_file("Provider A*.xlsx"))),
    ("Provider B", lambda: pd.read_excel(
        find_file("Provider B*.xlsx")).iloc[:, 0:19]),
    ("Provider C", provider_c_report),
    ("Provider D", provider_d_report),
    ("Provider E", lambda: pd.read_excel(
        find_file("Provider E*.xlsx")).iloc[:, 0:19]),
]:
    try:
        df = loader()
        df.columns = template_cols[:len(df.columns)]
        dfs.append(df)
    except FileNotFoundError:
        print(f"[SKIP] {label} not found")

if not dfs:
    raise RuntimeError("No files available")
df_result = pd.concat(dfs, ignore_index=True)
Position-Safe Column Block Reorder provider_c_report()
# col_dict rebuilt after every mutation —
# letter keys stay stable regardless of insert/drop order
col_dict = {
    get_column_letter(i + 1): col
    for i, col in enumerate(df.columns)
}
# Extract block, drop, re-insert at target position
block = df.loc[
    :, col_dict["I"]:col_dict["N"]
].copy()
df.drop(columns=block.columns.tolist(), inplace=True)
col_dict = {
    get_column_letter(i + 1): col
    for i, col in enumerate(df.columns)
}
c_idx = list(df.columns).index(col_dict["C"])
for i, col in enumerate(block.columns):
    df.insert(c_idx + i, col, block[col].values)
Stateless Update Detection via Filename Timestamp __main__
# Files saved as name_HHMM.ext at download time.
# Each run checks only files newer than its threshold
# — no lock files, no external state required.
mode = sys.argv[1] if len(sys.argv) > 1 else "initial"

has_after_1700 = any(
    int(Path(f).stem.rsplit('_', 1)[-1]) >= 1700
    for f in result.downloaded_files
)
if not xlsb.exists():
    send_report(..., is_updated=False)
elif mode == "second" and has_after_1700:
    send_report(..., is_updated=True)
elif mode == "third" and has_after_1800:
    send_report(..., is_updated=True)
else:
    send_skip(...)

// dev_process · Claude Code Conversations

Multi-Run Idempotency: Logic First
TJ
Extending to 3 daily runs. Files already have a _HHMM suffix at download time — use that as the update signal, no new state. First run gate is xlsb.exists(). Runs 2 and 3 use per-run timestamp thresholds. Verify all 6 cases before touching code.
C
All 6 cases verified — no overlap. Implemented sys.argv mode dispatch (initial / second / third) with thresholds ≥1700 / ≥1800. Created run_initial.bat, run_second.bat, run_third.bat.
Change Impact Analysis Before Edit
TJ
The name field in EMAIL_TARGETS — I know it renders in the email body, but does it get used anywhere else? Matching logic uses contains and sender, so I'm assuming it's safe to rename. Confirm before I change it.
C
Also used in log output and as the received_times dict key — but since both writer and reader pull from the same config, renaming is safe. Matching is unaffected.
TJ
Good. Apply carrier prefixes: CA / CB / CC / CD.
Partial Failure Handling: Decoupled Paths
TJ
Not all 5 files arrive every day. The consolidation, report send, and failure alert must be independent. A missing file shouldn't block the report, and a send failure shouldn't suppress the alert. RuntimeError only if every source is missing.
C
Each loader isolated in try/except FileNotFoundError inside a dispatch table. Alert and report are independent. Neither blocks the other. RuntimeError raised only when dfs is empty.
Module Split: API Contract Preserved
TJ
Split the email pipeline into email_setup, downloader, sender. Public function signatures and return types must stay identical. main.py should only need import line changes.
C
Done. All signatures and return types preserved. main.py diff is 3 import lines, zero logic changes.