Automate Excel Reports with Python: A Practical Guide Using pandas and openpyxl

29.03.2026·8 min read·Automation

Build a complete Python pipeline that reads raw data, transforms it, and generates formatted Excel reports — then schedule it to run automatically. Includes a full copy-paste script.

Automate Excel Reports with Python: A Practical Guide Using pandas and openpyxl

AI Generated Image

If you spend hours each week updating Excel reports, you are solving the wrong problem.

The spreadsheet is not the issue — the manual process is. Every copy-paste, formula check, and resend request is time you will never reclaim.

Python automation replaces all of that with a repeatable pipeline: raw data in, formatted report out, zero manual steps.

This guide walks through a complete working example. By the end, you will have a script you can copy, adapt to your data, and schedule to run daily.

# The Pipeline

Every Excel automation follows the same flow:

flowchart LR
  S[Source Data] --> P[Python]
  P --> C[Clean]
  C --> A[Aggregate]
  A --> F[Format]
  F --> E[Excel Report]
  E --> SC["Schedule\n(cron / Task Scheduler)"]

We will build each stage, then wire them together into a single script.

# What You Will Need

bash

pip install pandas openpyxl

pandas — fast data loading, cleaning, and aggregation
openpyxl — Excel file creation with formatting, styles, and formulas

# The Problem: Manual Reporting

If you recognise any of these, this guide is for you:

Copying data between spreadsheets weekly or monthly
Manually updating charts, pivot tables, or summary rows
Sending the same report to the same people on a schedule
Spending more time formatting than analysing
Getting burned by copy-paste errors that nobody catches until the meeting

One team I worked with spent 8 hours per week across four people doing exactly this. After automation: zero hours.

# Step 1: Load the Data with pandas

Say you have a sales dataset — sales_data.xlsx — with columns for date, region, product, quantity, and revenue.

python

import pandas as pd

df = pd.read_excel("sales_data.xlsx")
print(df.head())

text

        date   region       product  quantity  revenue
0 2026-03-01    North  Widget Pro        150  4500.00
1 2026-03-01    South  Widget Pro         90  2700.00
2 2026-03-02    North  Widget Lite       200  3000.00
3 2026-03-02    South  Widget Lite       120  1800.00
4 2026-03-03    North  Widget Pro        175  5250.00

pandas handles .xlsx, .csv, database connections, and APIs — so this same pattern works regardless of where your data lives.

# Step 2: Clean and Transform

Real data is messy. Handle that before aggregating:

python

# Drop rows with missing revenue
df = df.dropna(subset=["revenue"])

# Ensure correct types
df["date"] = pd.to_datetime(df["date"])
df["revenue"] = df["revenue"].astype(float)

# Add derived columns
df["month"] = df["date"].dt.to_period("M")
df["avg_price"] = df["revenue"] / df["quantity"]

# Step 3: Aggregate

This is where the report takes shape. Group by whatever dimensions matter for your business:

python

# Summary by region
region_summary = (
    df.groupby("region")
    .agg(total_units=("quantity", "sum"), total_revenue=("revenue", "sum"))
    .reset_index()
)
region_summary["avg_price"] = (
    region_summary["total_revenue"] / region_summary["total_units"]
)

# Summary by product
product_summary = (
    df.groupby("product")
    .agg(total_units=("quantity", "sum"), total_revenue=("revenue", "sum"))
    .reset_index()
)

Output:

region	total_units	total_revenue	avg_price
North	525	12,750.00	24.29
South	210	4,500.00	21.43

# Step 4: Generate a Formatted Excel Report

This is where openpyxl shines. pandas can write to Excel, but openpyxl gives you full control over formatting — headers, colours, number formats, merged cells:

python

from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
from datetime import datetime

def generate_report(region_data, product_data, output_path):
    wb = Workbook()

    # --- Region Summary Sheet ---
    ws = wb.active
    ws.title = "By Region"

    header_font = Font(bold=True, color="FFFFFF", size=11)
    header_fill = PatternFill(start_color="2F5496", end_color="2F5496", fill_type="solid")
    currency_fmt = "£#,##0.00"

    # Title
    ws.merge_cells("A1:D1")
    ws["A1"] = f"Sales Report — {datetime.now().strftime('%B %Y')}"
    ws["A1"].font = Font(bold=True, size=16)

    # Headers
    headers = ["Region", "Units Sold", "Revenue", "Avg Price"]
    for col, header in enumerate(headers, 1):
        cell = ws.cell(row=3, column=col, value=header)
        cell.font = header_font
        cell.fill = header_fill
        cell.alignment = Alignment(horizontal="center")

    # Data
    for i, row in region_data.iterrows():
        r = i + 4
        ws.cell(row=r, column=1, value=row["region"])
        ws.cell(row=r, column=2, value=row["total_units"])
        c3 = ws.cell(row=r, column=3, value=row["total_revenue"])
        c3.number_format = currency_fmt
        c4 = ws.cell(row=r, column=4, value=round(row["avg_price"], 2))
        c4.number_format = currency_fmt

    # Total row
    total_row = len(region_data) + 4
    ws.cell(row=total_row, column=1, value="TOTAL").font = Font(bold=True)
    ws.cell(row=total_row, column=2, value=region_data["total_units"].sum()).font = Font(bold=True)
    total_rev = ws.cell(row=total_row, column=3, value=region_data["total_revenue"].sum())
    total_rev.number_format = currency_fmt
    total_rev.font = Font(bold=True)

    # Auto-fit columns
    for col in ws.columns:
        max_len = max(len(str(cell.value or "")) for cell in col)
        ws.column_dimensions[col[0].column_letter].width = max_len + 4

    # --- Product Summary Sheet ---
    ws2 = wb.create_sheet("By Product")
    ws2.merge_cells("A1:C1")
    ws2["A1"] = "Product Breakdown"
    ws2["A1"].font = Font(bold=True, size=14)

    for col, header in enumerate(["Product", "Units Sold", "Revenue"], 1):
        cell = ws2.cell(row=3, column=col, value=header)
        cell.font = header_font
        cell.fill = header_fill
        cell.alignment = Alignment(horizontal="center")

    for i, row in product_data.iterrows():
        r = i + 4
        ws2.cell(row=r, column=1, value=row["product"])
        ws2.cell(row=r, column=2, value=row["total_units"])
        c = ws2.cell(row=r, column=3, value=row["total_revenue"])
        c.number_format = currency_fmt

    for col in ws2.columns:
        max_len = max(len(str(cell.value or "")) for cell in col)
        ws2.column_dimensions[col[0].column_letter].width = max_len + 4

    wb.save(output_path)
    return output_path

What the output looks like:

Sheet 1 ("By Region") — a styled summary table with blue headers, currency-formatted revenue, and a bold total row
Sheet 2 ("By Product") — product-level breakdown in the same format

Both sheets have auto-fitted column widths and consistent number formatting. No manual formatting required.

# Step 5: Run It End-to-End

python

if __name__ == "__main__":
    # Load
    df = pd.read_excel("sales_data.xlsx")

    # Clean
    df = df.dropna(subset=["revenue"])
    df["revenue"] = df["revenue"].astype(float)

    # Aggregate
    region_summary = (
        df.groupby("region")
        .agg(total_units=("quantity", "sum"), total_revenue=("revenue", "sum"))
        .reset_index()
    )
    region_summary["avg_price"] = region_summary["total_revenue"] / region_summary["total_units"]

    product_summary = (
        df.groupby("product")
        .agg(total_units=("quantity", "sum"), total_revenue=("revenue", "sum"))
        .reset_index()
    )

    # Generate
    output = generate_report(
        region_summary,
        product_summary,
        f"report_{datetime.now().strftime('%Y%m%d')}.xlsx",
    )
    print(f"Report generated: {output}")

text

Report generated: report_20260330.xlsx

What used to take 2+ hours of manual work now runs in under 5 seconds.

# Step 6: Schedule It to Run Automatically

A script you run manually is useful. A script that runs itself is transformational.

# On Linux / macOS (cron)

Run crontab -e and add:

bash

# Generate the sales report every weekday at 7:00 AM
0 7 * * 1-5 /usr/bin/python3 /path/to/generate_report.py >> /var/log/report.log 2>&1

# On Windows (Task Scheduler)

powershell

# Create a scheduled task that runs every weekday at 7:00 AM
$action = New-ScheduledTaskAction -Execute "python" -Argument "C:\reports\generate_report.py"
$trigger = New-ScheduledTaskTrigger -Weekly -DaysOfWeek Monday,Tuesday,Wednesday,Thursday,Friday -At 7am
Register-ScheduledTask -Action $action -Trigger $trigger -TaskName "DailySalesReport" -Description "Generate automated sales report"

Now the report lands on your desk (or in your inbox, if you add email delivery) before your first coffee.

# Real-World Scenarios

The sales example above is just one pattern. The same pipeline structure applies to:

# Finance — Monthly Close Reports

python

# Pull transactions, reconcile across accounts, generate P&L summary
df = pd.read_excel("transactions.xlsx")
pl = df.groupby("category").agg(
    debits=("debit", "sum"),
    credits=("credit", "sum"),
).reset_index()
pl["net"] = pl["credits"] - pl["debits"]

# Ecommerce — Daily KPI Dashboard

python

# Orders, revenue, refund rate, top products — refreshed every morning
orders = pd.read_csv("shopify_export.csv")
daily = orders.groupby(orders["created_at"].str[:10]).agg(
    order_count=("id", "count"),
    revenue=("total_price", "sum"),
).reset_index()

# Marketing — Campaign Performance Tracking

python

# Aggregate spend, impressions, clicks, conversions across channels
campaigns = pd.concat([
    pd.read_csv("google_ads.csv"),
    pd.read_csv("meta_ads.csv"),
])
summary = campaigns.groupby("campaign_name").agg(
    spend=("cost", "sum"),
    clicks=("clicks", "sum"),
    conversions=("conversions", "sum"),
).reset_index()
summary["cpa"] = summary["spend"] / summary["conversions"]

The data sources change. The pipeline does not.

# Python vs Other Excel Automation Tools

Feature	VBA / Macros	Power Query	Python
Learning curve	Medium	Low	Medium
Data sources	Excel only	Excel, SQL, web	Anything (APIs, DBs, files, web)
Scheduling	Limited	Manual refresh	Full (cron, Task Scheduler)
Scalability	Poor (crashes on large files)	Medium	Excellent (millions of rows)
Version control	Difficult	Not practical	Git-native
Error handling	Basic	Minimal	Full try/except, logging
Reusability	Copy-paste macros	Per-workbook	Import as modules

When to stick with Excel: one-off analysis, quick ad hoc pivots, or when the audience needs to edit the data themselves.

When to use Python: anything recurring, anything pulling from multiple sources, anything that needs to be reliable and auditable.

# Full Working Script (Copy-Paste)

Here is the complete, self-contained script. Save it as generate_report.py, point it at your data, and run:

python

"""
Automated Excel Report Generator
Reads sales data, aggregates by region and product, outputs a formatted .xlsx report.
"""
import pandas as pd
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
from datetime import datetime


def load_and_clean(filepath):
    df = pd.read_excel(filepath)
    df = df.dropna(subset=["revenue"])
    df["revenue"] = df["revenue"].astype(float)
    return df


def aggregate(df):
    region = (
        df.groupby("region")
        .agg(total_units=("quantity", "sum"), total_revenue=("revenue", "sum"))
        .reset_index()
    )
    region["avg_price"] = region["total_revenue"] / region["total_units"]

    product = (
        df.groupby("product")
        .agg(total_units=("quantity", "sum"), total_revenue=("revenue", "sum"))
        .reset_index()
    )
    return region, product


def write_sheet(wb, title, headers, data_rows, currency_cols):
    ws = wb.create_sheet(title) if wb.sheetnames != ["Sheet"] else wb.active
    if ws.title == "Sheet":
        ws.title = title

    hdr_font = Font(bold=True, color="FFFFFF", size=11)
    hdr_fill = PatternFill(start_color="2F5496", end_color="2F5496", fill_type="solid")
    currency_fmt = "£#,##0.00"

    ws.merge_cells(f"A1:{chr(64 + len(headers))}1")
    ws["A1"] = f"{title} — {datetime.now().strftime('%B %Y')}"
    ws["A1"].font = Font(bold=True, size=14)

    for col, header in enumerate(headers, 1):
        cell = ws.cell(row=3, column=col, value=header)
        cell.font = hdr_font
        cell.fill = hdr_fill
        cell.alignment = Alignment(horizontal="center")

    for r_idx, row_data in enumerate(data_rows, 4):
        for c_idx, value in enumerate(row_data, 1):
            cell = ws.cell(row=r_idx, column=c_idx, value=value)
            if c_idx in currency_cols:
                cell.number_format = currency_fmt

    for col in ws.columns:
        max_len = max(len(str(c.value or "")) for c in col)
        ws.column_dimensions[col[0].column_letter].width = max_len + 4


def generate_report(region_df, product_df, output_path):
    wb = Workbook()

    region_rows = [
        (r["region"], r["total_units"], r["total_revenue"], round(r["avg_price"], 2))
        for _, r in region_df.iterrows()
    ]
    write_sheet(wb, "By Region", ["Region", "Units Sold", "Revenue", "Avg Price"], region_rows, {3, 4})

    product_rows = [
        (r["product"], r["total_units"], r["total_revenue"])
        for _, r in product_df.iterrows()
    ]
    write_sheet(wb, "By Product", ["Product", "Units Sold", "Revenue"], product_rows, {3})

    wb.save(output_path)
    print(f"Report generated: {output_path}")


if __name__ == "__main__":
    df = load_and_clean("sales_data.xlsx")
    region_summary, product_summary = aggregate(df)
    generate_report(
        region_summary,
        product_summary,
        f"report_{datetime.now().strftime('%Y%m%d')}.xlsx",
    )

Adapt it to your data by changing the column names in aggregate() and the headers in generate_report().

# What This Replaces

Before (manual)	After (automated)
2+ hours per report	Under 5 seconds
Copy-paste errors	Zero — data flows directly
Inconsistent formatting	Identical every time
"Can you resend with the latest numbers?"	Always up to date
Runs when someone remembers	Runs on schedule

# Next Steps

This script is a starting point. Production automation systems typically add:

Email delivery — smtplib to send the report to stakeholders automatically
Multiple data sources — pull from databases, APIs, and multiple files in one pipeline
Error handling and logging — validate data and alert on anomalies before the report goes out
Charts and dashboards — openpyxl supports bar charts, line charts, and conditional formatting

If you are spending hours on repetitive reporting, automation services can eliminate that entirely. Every system I build follows the same principle: identify the repetitive, automate the predictable, free up time for the work that matters.

Get in touch to discuss automating your reporting workflows.

Enjoyed this article?

Get notified when I publish new articles on automation, ecommerce, and data engineering.

automate excel reports pythonpython excel automationpandas excel reportopenpyxl pythondaily reporting automationexcel dashboard automationpython script for excel reportsautomate spreadsheet pythonpython replace manual reportingscheduled excel reports