Build a Python AI Agent for Automated Data Analysis

· 12 min read · Automation

Build an AI agent in Python that analyses datasets autonomously — writing and executing its own pandas code, generating summaries, and producing reports — with tool-use patterns, sandboxed execution, and guardrails to keep it from going off the rails.

Build a Python AI Agent for Automated Data Analysis

Most data analysis follows the same loop. You load a dataset. You check the shape and dtypes. You look for nulls. You compute summary stats. You group by a few columns and look for patterns. You make a chart. You write up what you found. Repeat until something interesting shows up.

This loop is tedious enough that you skip it on datasets that might have insights buried in them. You already know the drill, so you only analyse data when someone asks for it, not proactively.

An AI agent can run this loop for you. Not by generating a fixed script — you could write that yourself — but by deciding what to look at based on what it has already seen. It loads the data, checks the structure, picks an angle, writes code to investigate, reads the output, and decides what to do next. You point it at a CSV and come back to a finished report.

This guide builds that agent in Python. It uses OpenAI’s tool-calling API, executes pandas code in a sandbox, and produces a structured analysis report — with every guardrail you need so it does not delete your files or rack up a $50 API bill.

Who This Is For

  • Data engineers who want to automate the exploratory phase of data analysis
  • Developers building internal tools that let non-technical teams analyse data with natural language
  • Teams sitting on datasets that nobody has time to explore properly
  • Vibe coders who have seen AI agent demos and want to build one that actually does something useful

You need basic Python and a conceptual understanding of how LLM API calls work. The guide explains tool-calling from scratch.

Agent Architecture

The agent runs a loop: think, pick a tool, observe the result, repeat. Each iteration adds to the agent’s context, so it builds understanding incrementally. The loop stops when the agent calls the summarise tool, which means it has enough findings to write a report.

What You Will Need

pip install openai pandas tabulate
  • openai — LLM client with tool-calling support
  • pandas — what the agent uses to analyse data
  • tabulate — pretty-print DataFrames in the agent’s context

You will need an OpenAI API key:

export OPENAI_API_KEY="sk-..."

Step 1: Define the Agent’s Tools

Tools are functions the agent can call. Each tool has a name, a description the LLM reads, and a JSON schema for its arguments.

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "inspect_data",
            "description": (
                "Get the shape, column names, dtypes, null counts, "
                "and first 5 rows of the dataset. Call this first to "
                "understand the data before writing analysis code."
            ),
            "parameters": {
                "type": "object",
                "properties": {},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "run_analysis",
            "description": (
                "Execute a pandas code snippet against the loaded dataset. "
                "The dataset is available as `df`. The code must produce "
                "output via print(). Do not modify df in place — use "
                "intermediate variables for transformed data."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "Python code using pandas. Must print results.",
                    },
                    "reasoning": {
                        "type": "string",
                        "description": "One sentence explaining why this analysis step.",
                    },
                },
                "required": ["code", "reasoning"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "summarise",
            "description": (
                "Produce a final analysis report based on all findings so far. "
                "Call this when you have enough insights to answer the user's "
                "question or have completed a thorough exploration."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "report": {
                        "type": "string",
                        "description": "The final analysis report in markdown format.",
                    },
                },
                "required": ["report"],
            },
        },
    },
]

The descriptions matter. The LLM reads them to decide which tool to use. “Call this first” in the inspect_data description is a steering hint — without it, the agent sometimes jumps straight to writing code on data it has not seen yet.

Step 2: Sandboxed Code Execution

This is the most security-sensitive part. The agent writes code. That code runs on your machine. You need guardrails.

import subprocess
import tempfile
import os
import logging
from pathlib import Path

logger = logging.getLogger("agent")

# only these imports are allowed in agent-generated code
ALLOWED_IMPORTS = {"pandas", "numpy", "math", "statistics", "collections", "itertools"}

# patterns that should never appear in agent code
BLOCKED_PATTERNS = [
    "import os",
    "import sys",
    "import subprocess",
    "import shutil",
    "open(",         # no file I/O beyond the loaded dataframe
    "__import__",
    "exec(",
    "eval(",
    "compile(",
    "globals(",
    "locals(",
    "import socket",
    "import http",
    "import urllib",
    "requests.",
]


def validate_code(code: str) -> tuple[bool, str]:
    """Check agent-generated code for disallowed operations.

    Returns (is_safe, reason). Rejects code that tries to access
    the filesystem, network, or escape the sandbox.
    """
    for pattern in BLOCKED_PATTERNS:
        if pattern in code:
            return False, f"Blocked pattern found: {pattern}"

    # check imports against allowlist
    for line in code.splitlines():
        stripped = line.strip()
        if stripped.startswith("import ") or stripped.startswith("from "):
            # extract the top-level module name
            if stripped.startswith("from "):
                module = stripped.split()[1].split(".")[0]
            else:
                module = stripped.split()[1].split(".")[0]

            if module not in ALLOWED_IMPORTS:
                return False, f"Import not allowed: {module}"

    return True, "ok"


def execute_code(code: str, data_path: str, timeout: int = 30) -> str:
    """Run agent-generated code in a subprocess with the dataset loaded.

    The code runs in a fresh Python process with df pre-loaded.
    Output is captured from stdout. Errors are returned as strings.
    """
    is_safe, reason = validate_code(code)
    if not is_safe:
        return f"CODE REJECTED: {reason}"

    # wrap the agent's code with data loading
    wrapper = f"""
import pandas as pd
import numpy as np
from tabulate import tabulate

df = pd.read_csv("{data_path}")
# --- agent code below ---
{code}
"""

    with tempfile.NamedTemporaryFile(
        mode="w", suffix=".py", delete=False, dir=tempfile.gettempdir()
    ) as f:
        f.write(wrapper)
        script_path = f.name

    try:
        result = subprocess.run(
            ["python", script_path],
            capture_output=True,
            text=True,
            timeout=timeout,
            cwd=tempfile.gettempdir(),  # restrict working directory
        )

        output = result.stdout.strip()
        if result.returncode != 0:
            error = result.stderr.strip()
            # truncate long tracebacks — the agent does not need 50 lines of stack
            error_lines = error.splitlines()
            if len(error_lines) > 10:
                error = "\n".join(error_lines[-10:])
            return f"ERROR:\n{error}"

        # truncate very long output to keep the context window manageable
        if len(output) > 3000:
            output = output[:3000] + "\n... (output truncated)"

        return output if output else "(no output — did you forget to print?)"

    except subprocess.TimeoutExpired:
        return f"TIMEOUT: Code took more than {timeout}s. Simplify your analysis."

    finally:
        os.unlink(script_path)

Three layers of protection. First, the code is scanned for dangerous patterns before it runs. Second, it runs in a subprocess, not in the agent’s own process — a crash does not take down the agent. Third, the working directory is /tmp so the code cannot reach your real files even if it somehow gets past the import check.

The output truncation is practical, not security. If the agent prints a 100,000-row DataFrame, the entire output goes into the LLM’s context window. That wastes tokens and confuses the model. 3,000 characters is enough for the agent to read the results and decide what to do next.

Step 3: Tool Execution Layer

Map tool names to actual functions.

import json
import pandas as pd


class ToolExecutor:
    """Execute tools that the agent requests."""

    def __init__(self, data_path: str):
        self.data_path = data_path

    def inspect_data(self) -> str:
        """Return dataset shape, dtypes, nulls, and sample rows."""
        df = pd.read_csv(self.data_path)
        parts = [
            f"Shape: {df.shape[0]:,} rows x {df.shape[1]} columns",
            f"\nColumns and types:\n{df.dtypes.to_string()}",
            f"\nNull counts:\n{df.isnull().sum().to_string()}",
            f"\nFirst 5 rows:\n{df.head().to_string()}",
            f"\nNumeric summary:\n{df.describe().to_string()}",
        ]
        return "\n".join(parts)

    def run_analysis(self, code: str, reasoning: str) -> str:
        """Execute pandas code in the sandbox."""
        logger.info(f"Running analysis: {reasoning}")
        return execute_code(code, self.data_path)

    def execute(self, tool_name: str, arguments: dict) -> str:
        """Dispatch a tool call to the right handler."""
        if tool_name == "inspect_data":
            return self.inspect_data()
        elif tool_name == "run_analysis":
            return self.run_analysis(
                code=arguments["code"],
                reasoning=arguments.get("reasoning", ""),
            )
        elif tool_name == "summarise":
            # the report is the final output, not a tool result
            return arguments["report"]
        else:
            return f"Unknown tool: {tool_name}"

Step 4: The Agent Loop

This is where it comes together. The agent loop sends messages to the LLM, gets back tool calls, executes them, feeds the results back, and repeats.

from openai import OpenAI

client = OpenAI()

SYSTEM_PROMPT = """You are a data analyst agent. You have access to a dataset
loaded as a pandas DataFrame.

Your workflow:
1. First call inspect_data to understand the dataset
2. Then call run_analysis with pandas code to investigate patterns
3. Each code snippet must print() its output
4. After 4-8 analysis steps, call summarise with your findings

Rules:
- Always inspect the data before writing analysis code
- Write clean pandas code, no plots or charts
- If code errors, fix the code and try again
- Focus on actionable insights, not just descriptive stats
- Do not modify the original DataFrame in place"""


def run_agent(data_path: str, question: str | None = None,
              model: str = "gpt-4o-mini", max_steps: int = 15) -> str:
    """Run the analysis agent on a dataset.

    Args:
        data_path: path to a CSV file
        question: specific question to answer (or None for general exploration)
        model: which OpenAI model to use
        max_steps: safety cap on tool calls

    Returns:
        The final analysis report as a string.
    """
    user_msg = f"Dataset: {data_path}\n"
    if question:
        user_msg += f"Question: {question}"
    else:
        user_msg += "Explore this dataset and report the most interesting findings."

    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_msg},
    ]

    executor = ToolExecutor(data_path)
    step = 0

    while step < max_steps:
        step += 1
        logger.info(f"Agent step {step}/{max_steps}")

        response = client.chat.completions.create(
            model=model,
            messages=messages,
            tools=TOOLS,
            temperature=0.2,
        )

        choice = response.choices[0]

        # if the model responds with text and no tool call, it is done
        if choice.finish_reason == "stop":
            logger.info("Agent finished (no more tool calls)")
            return choice.message.content or "No report generated."

        if not choice.message.tool_calls:
            logger.warning("No tool calls and not stopped — ending")
            return choice.message.content or "No report generated."

        # add the assistant message (with tool calls) to history
        messages.append(choice.message)

        for tool_call in choice.message.tool_calls:
            fn_name = tool_call.function.name
            fn_args = json.loads(tool_call.function.arguments)

            logger.info(f"Tool call: {fn_name}")

            result = executor.execute(fn_name, fn_args)

            # if this is the summarise tool, we are done
            if fn_name == "summarise":
                logger.info("Agent produced final report")
                return result

            # feed the tool result back to the agent
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result,
            })

    logger.warning(f"Agent hit max steps ({max_steps})")
    return "Analysis incomplete — agent reached step limit."

The max_steps cap is a hard safety limit. Without it, a confused agent can loop forever, burning tokens. 15 steps is generous — most analyses finish in 6-10 steps.

Temperature is 0.2 — low enough for consistent analysis code, high enough that the agent varies its investigation approach across runs.

Step 5: Running the Agent

if __name__ == "__main__":
    import sys

    logging.basicConfig(
        level=logging.INFO,
        format="%(asctime)s [%(name)s] %(levelname)s %(message)s",
    )

    data_path = sys.argv[1] if len(sys.argv) > 1 else "data/sales.csv"
    question = sys.argv[2] if len(sys.argv) > 2 else None

    report = run_agent(data_path, question)

    print("\n" + "=" * 60)
    print("ANALYSIS REPORT")
    print("=" * 60)
    print(report)

Usage

# general exploration
python agent.py data/sales.csv

# specific question
python agent.py data/sales.csv "Which product category has the highest return rate?"

What the Agent Actually Does

Here is a real run on a sales dataset, showing each step:

Agent step 1/15 — Tool call: inspect_data
Agent step 2/15 — Tool call: run_analysis (checking date range and order volume)
Agent step 3/15 — Tool call: run_analysis (revenue by category breakdown)
Agent step 4/15 — Tool call: run_analysis (month-over-month growth rates)
Agent step 5/15 — Tool call: run_analysis (top 10 customers by lifetime value)
Agent step 6/15 — Tool call: run_analysis (return rate by product category)
Agent step 7/15 — Tool call: summarise
Agent produced final report

Seven steps. The agent decided on its own to check the date range first, then break down revenue by category, calculate growth rates, find top customers, and investigate return rates. Each step’s output informed what it looked at next.

Step 6: Improving Agent Quality

Add a Review Step

The first version of the agent produces decent analysis but sometimes misses obvious things. Add a self-review step.

def run_agent_with_review(data_path: str, question: str | None = None,
                          model: str = "gpt-4o-mini") -> str:
    """Run the agent, then ask it to review and improve its own report."""
    initial_report = run_agent(data_path, question, model=model)

    review_prompt = f"""Review this data analysis report. Check for:
- Claims without supporting numbers
- Obvious analyses that were skipped
- Misleading conclusions

Original report:
{initial_report}

If the report is solid, return it unchanged. If it needs fixes, return
the improved version. Do not add fluff — only add substance."""

    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a senior data analyst reviewing work."},
            {"role": "user", "content": review_prompt},
        ],
        temperature=0.1,
    )

    return response.choices[0].message.content

This doubles the token cost but catches a surprising number of issues — the agent sometimes draws conclusions that the data does not actually support, and the review step catches those.

Track Token Usage

class TokenTracker:
    """Track cumulative token usage across agent steps."""

    def __init__(self):
        self.total_input = 0
        self.total_output = 0

    def record(self, usage):
        self.total_input += usage.prompt_tokens
        self.total_output += usage.completion_tokens

    def cost(self, model: str = "gpt-4o-mini") -> float:
        rates = {
            "gpt-4o-mini": (0.15, 0.60),  # per 1M tokens
            "gpt-4o": (2.50, 10.00),
        }
        input_rate, output_rate = rates.get(model, (0.15, 0.60))
        return (
            self.total_input * input_rate / 1_000_000
            + self.total_output * output_rate / 1_000_000
        )

    def __str__(self):
        return (
            f"Tokens — input: {self.total_input:,}, "
            f"output: {self.total_output:,}"
        )

Wire this into the agent loop by calling tracker.record(response.usage) after each API call. At the end of a run, log the total cost so you know what each analysis session costs.

Step 7: Handling Edge Cases

Code Errors and Self-Correction

When the agent writes code that errors, the error message goes back into the context. Most of the time, the agent fixes its own mistake on the next step.

# inside the agent loop, after a tool call returns an error:
if result.startswith("ERROR:"):
    logger.info("Code error — agent will attempt self-correction")
    # the error is already in the tool result message,
    # so the agent sees it on the next iteration and adjusts

Track how often this happens. If the agent is self-correcting more than 20% of the time, your system prompt needs better instructions about the data format.

Large Datasets

For datasets larger than a few hundred megabytes, the inspect_data tool should not load the entire file.

def inspect_data_large(self) -> str:
    """Inspect a large dataset without loading it all into memory."""
    # read just the first 1000 rows for structure inspection
    df_sample = pd.read_csv(self.data_path, nrows=1000)

    # get actual row count without loading data
    row_count = sum(1 for _ in open(self.data_path)) - 1  # minus header

    parts = [
        f"Shape: {row_count:,} rows x {df_sample.shape[1]} columns",
        f"(inspecting first 1,000 rows for types and nulls)",
        f"\nColumns and types:\n{df_sample.dtypes.to_string()}",
        f"\nNull counts (in sample):\n{df_sample.isnull().sum().to_string()}",
        f"\nFirst 5 rows:\n{df_sample.head().to_string()}",
    ]
    return "\n".join(parts)

What This Replaces

Manual processAgent equivalent
Writing the same EDA notebook for every datasetAgent handles exploration autonomously
Waiting for a data analyst to be availableRun the agent immediately on any dataset
Skipping analysis on low-priority datasetsAgent is cheap enough to run on everything
Copy-pasting pandas snippets from Stack OverflowAgent writes analysis code tailored to your data
Writing a report after doing the analysisAgent produces the report as part of the analysis

Next Steps

For building the data pipelines that produce the datasets this agent analyses, see How to Design Data Pipelines for Reliable Reporting. For adding LLM-powered enrichment to your pipeline stages, see Build an LLM-Powered Data Pipeline with Python and OpenAI. For testing the code that the agent produces, see Testing Data Pipelines with Pytest. For making these analyses available as automated reports, see Build a Data Dashboard Without Excel.

Automation services include building AI agent systems for data analysis, report generation, and workflow automation.

Get in touch to discuss building an AI agent for your data analysis workflows.

Frequently Asked Questions

What is an AI agent and how is it different from a chatbot?
A chatbot generates text responses. An AI agent can take actions — it decides which tools to call, executes code, reads files, and iterates on results. The agent in this guide writes pandas code, runs it, reads the output, and decides what to analyse next without you telling it each step.
Is it safe to let an AI agent execute code?
Not without guardrails. The agent in this guide runs code in a restricted subprocess with import whitelisting, a timeout, and no filesystem write access beyond a scratch directory. It cannot install packages, make network requests, or modify files outside its sandbox.
Can I use this with local models instead of OpenAI?
Yes, if the model supports tool calling. Ollama with Llama 3, Mistral, or Qwen models works. The agent uses the OpenAI client format, and most local inference servers (Ollama, vLLM, LiteLLM) expose the same API. Smaller models produce worse analysis but the architecture stays the same.
How much does it cost to run an analysis?
A typical analysis session uses 5-15 tool calls. With GPT-4o-mini, that is roughly 20K-50K tokens per session, costing $0.01-0.05. GPT-4o costs about 10x more but produces better insights on complex datasets.
python ai agent ai data analysis agent openai function calling python llm agent tool use automated data analysis python ai agent framework python python agent code execution llm tool calling pipeline autonomous data analysis ai agent guardrails

Enjoyed this article?

Get notified when I publish new articles on automation, ecommerce, and data engineering.

Get in touch

Related Articles