Automate Excel Reports with Python: A Practical Guide Using pandas and openpyxl
Build a complete Python pipeline that reads raw data, transforms it, and generates formatted Excel reports — then schedule it to run automatically. Includes a full copy-paste script.
AI Generated ImageIf you spend hours each week updating Excel reports, you are solving the wrong problem.
The spreadsheet is not the issue — the manual process is. Every copy-paste, formula check, and resend request is time you will never reclaim.
Python automation replaces all of that with a repeatable pipeline: raw data in, formatted report out, zero manual steps.
This guide walks through a complete working example. By the end, you will have a script you can copy, adapt to your data, and schedule to run daily.
# The Pipeline
Every Excel automation follows the same flow:
flowchart LR S[Source Data] --> P[Python] P --> C[Clean] C --> A[Aggregate] A --> F[Format] F --> E[Excel Report] E --> SC["Schedule\n(cron / Task Scheduler)"]
We will build each stage, then wire them together into a single script.
# What You Will Need
pip install pandas openpyxl
- pandas — fast data loading, cleaning, and aggregation
- openpyxl — Excel file creation with formatting, styles, and formulas
# The Problem: Manual Reporting
If you recognise any of these, this guide is for you:
- Copying data between spreadsheets weekly or monthly
- Manually updating charts, pivot tables, or summary rows
- Sending the same report to the same people on a schedule
- Spending more time formatting than analysing
- Getting burned by copy-paste errors that nobody catches until the meeting
One team I worked with spent 8 hours per week across four people doing exactly this. After automation: zero hours.
# Step 1: Load the Data with pandas
Say you have a sales dataset — sales_data.xlsx — with columns for date, region, product, quantity, and revenue.
import pandas as pd
df = pd.read_excel("sales_data.xlsx")
print(df.head())
date region product quantity revenue
0 2026-03-01 North Widget Pro 150 4500.00
1 2026-03-01 South Widget Pro 90 2700.00
2 2026-03-02 North Widget Lite 200 3000.00
3 2026-03-02 South Widget Lite 120 1800.00
4 2026-03-03 North Widget Pro 175 5250.00
pandas handles .xlsx, .csv, database connections, and APIs — so this same pattern works regardless of where your data lives.
# Step 2: Clean and Transform
Real data is messy. Handle that before aggregating:
# Drop rows with missing revenue
df = df.dropna(subset=["revenue"])
# Ensure correct types
df["date"] = pd.to_datetime(df["date"])
df["revenue"] = df["revenue"].astype(float)
# Add derived columns
df["month"] = df["date"].dt.to_period("M")
df["avg_price"] = df["revenue"] / df["quantity"]
# Step 3: Aggregate
This is where the report takes shape. Group by whatever dimensions matter for your business:
# Summary by region
region_summary = (
df.groupby("region")
.agg(total_units=("quantity", "sum"), total_revenue=("revenue", "sum"))
.reset_index()
)
region_summary["avg_price"] = (
region_summary["total_revenue"] / region_summary["total_units"]
)
# Summary by product
product_summary = (
df.groupby("product")
.agg(total_units=("quantity", "sum"), total_revenue=("revenue", "sum"))
.reset_index()
)
Output:
| region | total_units | total_revenue | avg_price |
|---|---|---|---|
| North | 525 | 12,750.00 | 24.29 |
| South | 210 | 4,500.00 | 21.43 |
# Step 4: Generate a Formatted Excel Report
This is where openpyxl shines. pandas can write to Excel, but openpyxl gives you full control over formatting — headers, colours, number formats, merged cells:
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
from datetime import datetime
def generate_report(region_data, product_data, output_path):
wb = Workbook()
# --- Region Summary Sheet ---
ws = wb.active
ws.title = "By Region"
header_font = Font(bold=True, color="FFFFFF", size=11)
header_fill = PatternFill(start_color="2F5496", end_color="2F5496", fill_type="solid")
currency_fmt = "£#,##0.00"
# Title
ws.merge_cells("A1:D1")
ws["A1"] = f"Sales Report — {datetime.now().strftime('%B %Y')}"
ws["A1"].font = Font(bold=True, size=16)
# Headers
headers = ["Region", "Units Sold", "Revenue", "Avg Price"]
for col, header in enumerate(headers, 1):
cell = ws.cell(row=3, column=col, value=header)
cell.font = header_font
cell.fill = header_fill
cell.alignment = Alignment(horizontal="center")
# Data
for i, row in region_data.iterrows():
r = i + 4
ws.cell(row=r, column=1, value=row["region"])
ws.cell(row=r, column=2, value=row["total_units"])
c3 = ws.cell(row=r, column=3, value=row["total_revenue"])
c3.number_format = currency_fmt
c4 = ws.cell(row=r, column=4, value=round(row["avg_price"], 2))
c4.number_format = currency_fmt
# Total row
total_row = len(region_data) + 4
ws.cell(row=total_row, column=1, value="TOTAL").font = Font(bold=True)
ws.cell(row=total_row, column=2, value=region_data["total_units"].sum()).font = Font(bold=True)
total_rev = ws.cell(row=total_row, column=3, value=region_data["total_revenue"].sum())
total_rev.number_format = currency_fmt
total_rev.font = Font(bold=True)
# Auto-fit columns
for col in ws.columns:
max_len = max(len(str(cell.value or "")) for cell in col)
ws.column_dimensions[col[0].column_letter].width = max_len + 4
# --- Product Summary Sheet ---
ws2 = wb.create_sheet("By Product")
ws2.merge_cells("A1:C1")
ws2["A1"] = "Product Breakdown"
ws2["A1"].font = Font(bold=True, size=14)
for col, header in enumerate(["Product", "Units Sold", "Revenue"], 1):
cell = ws2.cell(row=3, column=col, value=header)
cell.font = header_font
cell.fill = header_fill
cell.alignment = Alignment(horizontal="center")
for i, row in product_data.iterrows():
r = i + 4
ws2.cell(row=r, column=1, value=row["product"])
ws2.cell(row=r, column=2, value=row["total_units"])
c = ws2.cell(row=r, column=3, value=row["total_revenue"])
c.number_format = currency_fmt
for col in ws2.columns:
max_len = max(len(str(cell.value or "")) for cell in col)
ws2.column_dimensions[col[0].column_letter].width = max_len + 4
wb.save(output_path)
return output_path
What the output looks like:
- Sheet 1 ("By Region") — a styled summary table with blue headers, currency-formatted revenue, and a bold total row
- Sheet 2 ("By Product") — product-level breakdown in the same format
Both sheets have auto-fitted column widths and consistent number formatting. No manual formatting required.
# Step 5: Run It End-to-End
if __name__ == "__main__":
# Load
df = pd.read_excel("sales_data.xlsx")
# Clean
df = df.dropna(subset=["revenue"])
df["revenue"] = df["revenue"].astype(float)
# Aggregate
region_summary = (
df.groupby("region")
.agg(total_units=("quantity", "sum"), total_revenue=("revenue", "sum"))
.reset_index()
)
region_summary["avg_price"] = region_summary["total_revenue"] / region_summary["total_units"]
product_summary = (
df.groupby("product")
.agg(total_units=("quantity", "sum"), total_revenue=("revenue", "sum"))
.reset_index()
)
# Generate
output = generate_report(
region_summary,
product_summary,
f"report_{datetime.now().strftime('%Y%m%d')}.xlsx",
)
print(f"Report generated: {output}")
Report generated: report_20260330.xlsx
What used to take 2+ hours of manual work now runs in under 5 seconds.
# Step 6: Schedule It to Run Automatically
A script you run manually is useful. A script that runs itself is transformational.
# On Linux / macOS (cron)
Run crontab -e and add:
# Generate the sales report every weekday at 7:00 AM
0 7 * * 1-5 /usr/bin/python3 /path/to/generate_report.py >> /var/log/report.log 2>&1
# On Windows (Task Scheduler)
# Create a scheduled task that runs every weekday at 7:00 AM
$action = New-ScheduledTaskAction -Execute "python" -Argument "C:\reports\generate_report.py"
$trigger = New-ScheduledTaskTrigger -Weekly -DaysOfWeek Monday,Tuesday,Wednesday,Thursday,Friday -At 7am
Register-ScheduledTask -Action $action -Trigger $trigger -TaskName "DailySalesReport" -Description "Generate automated sales report"
Now the report lands on your desk (or in your inbox, if you add email delivery) before your first coffee.
# Real-World Scenarios
The sales example above is just one pattern. The same pipeline structure applies to:
# Finance — Monthly Close Reports
# Pull transactions, reconcile across accounts, generate P&L summary
df = pd.read_excel("transactions.xlsx")
pl = df.groupby("category").agg(
debits=("debit", "sum"),
credits=("credit", "sum"),
).reset_index()
pl["net"] = pl["credits"] - pl["debits"]
# Ecommerce — Daily KPI Dashboard
# Orders, revenue, refund rate, top products — refreshed every morning
orders = pd.read_csv("shopify_export.csv")
daily = orders.groupby(orders["created_at"].str[:10]).agg(
order_count=("id", "count"),
revenue=("total_price", "sum"),
).reset_index()
# Marketing — Campaign Performance Tracking
# Aggregate spend, impressions, clicks, conversions across channels
campaigns = pd.concat([
pd.read_csv("google_ads.csv"),
pd.read_csv("meta_ads.csv"),
])
summary = campaigns.groupby("campaign_name").agg(
spend=("cost", "sum"),
clicks=("clicks", "sum"),
conversions=("conversions", "sum"),
).reset_index()
summary["cpa"] = summary["spend"] / summary["conversions"]
The data sources change. The pipeline does not.
# Python vs Other Excel Automation Tools
| Feature | VBA / Macros | Power Query | Python |
|---|---|---|---|
| Learning curve | Medium | Low | Medium |
| Data sources | Excel only | Excel, SQL, web | Anything (APIs, DBs, files, web) |
| Scheduling | Limited | Manual refresh | Full (cron, Task Scheduler) |
| Scalability | Poor (crashes on large files) | Medium | Excellent (millions of rows) |
| Version control | Difficult | Not practical | Git-native |
| Error handling | Basic | Minimal | Full try/except, logging |
| Reusability | Copy-paste macros | Per-workbook | Import as modules |
When to stick with Excel: one-off analysis, quick ad hoc pivots, or when the audience needs to edit the data themselves.
When to use Python: anything recurring, anything pulling from multiple sources, anything that needs to be reliable and auditable.
# Full Working Script (Copy-Paste)
Here is the complete, self-contained script. Save it as generate_report.py, point it at your data, and run:
"""
Automated Excel Report Generator
Reads sales data, aggregates by region and product, outputs a formatted .xlsx report.
"""
import pandas as pd
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
from datetime import datetime
def load_and_clean(filepath):
df = pd.read_excel(filepath)
df = df.dropna(subset=["revenue"])
df["revenue"] = df["revenue"].astype(float)
return df
def aggregate(df):
region = (
df.groupby("region")
.agg(total_units=("quantity", "sum"), total_revenue=("revenue", "sum"))
.reset_index()
)
region["avg_price"] = region["total_revenue"] / region["total_units"]
product = (
df.groupby("product")
.agg(total_units=("quantity", "sum"), total_revenue=("revenue", "sum"))
.reset_index()
)
return region, product
def write_sheet(wb, title, headers, data_rows, currency_cols):
ws = wb.create_sheet(title) if wb.sheetnames != ["Sheet"] else wb.active
if ws.title == "Sheet":
ws.title = title
hdr_font = Font(bold=True, color="FFFFFF", size=11)
hdr_fill = PatternFill(start_color="2F5496", end_color="2F5496", fill_type="solid")
currency_fmt = "£#,##0.00"
ws.merge_cells(f"A1:{chr(64 + len(headers))}1")
ws["A1"] = f"{title} — {datetime.now().strftime('%B %Y')}"
ws["A1"].font = Font(bold=True, size=14)
for col, header in enumerate(headers, 1):
cell = ws.cell(row=3, column=col, value=header)
cell.font = hdr_font
cell.fill = hdr_fill
cell.alignment = Alignment(horizontal="center")
for r_idx, row_data in enumerate(data_rows, 4):
for c_idx, value in enumerate(row_data, 1):
cell = ws.cell(row=r_idx, column=c_idx, value=value)
if c_idx in currency_cols:
cell.number_format = currency_fmt
for col in ws.columns:
max_len = max(len(str(c.value or "")) for c in col)
ws.column_dimensions[col[0].column_letter].width = max_len + 4
def generate_report(region_df, product_df, output_path):
wb = Workbook()
region_rows = [
(r["region"], r["total_units"], r["total_revenue"], round(r["avg_price"], 2))
for _, r in region_df.iterrows()
]
write_sheet(wb, "By Region", ["Region", "Units Sold", "Revenue", "Avg Price"], region_rows, {3, 4})
product_rows = [
(r["product"], r["total_units"], r["total_revenue"])
for _, r in product_df.iterrows()
]
write_sheet(wb, "By Product", ["Product", "Units Sold", "Revenue"], product_rows, {3})
wb.save(output_path)
print(f"Report generated: {output_path}")
if __name__ == "__main__":
df = load_and_clean("sales_data.xlsx")
region_summary, product_summary = aggregate(df)
generate_report(
region_summary,
product_summary,
f"report_{datetime.now().strftime('%Y%m%d')}.xlsx",
)
Adapt it to your data by changing the column names in aggregate() and the headers in generate_report().
# What This Replaces
| Before (manual) | After (automated) |
|---|---|
| 2+ hours per report | Under 5 seconds |
| Copy-paste errors | Zero — data flows directly |
| Inconsistent formatting | Identical every time |
| "Can you resend with the latest numbers?" | Always up to date |
| Runs when someone remembers | Runs on schedule |
# Next Steps
This script is a starting point. Production automation systems typically add:
- Email delivery —
smtplibto send the report to stakeholders automatically - Multiple data sources — pull from databases, APIs, and multiple files in one pipeline
- Error handling and logging — validate data and alert on anomalies before the report goes out
- Charts and dashboards — openpyxl supports bar charts, line charts, and conditional formatting
If you are spending hours on repetitive reporting, automation services can eliminate that entirely. Every system I build follows the same principle: identify the repetitive, automate the predictable, free up time for the work that matters.
Get in touch to discuss automating your reporting workflows.
Enjoyed this article?
Get notified when I publish new articles on automation, ecommerce, and data engineering.