Python Polars vs Pandas: Performance Benchmarks with Real Data

·9 min read·Data & Dashboards

Benchmark Polars against Pandas on real-world data tasks — CSV loading, group aggregations, joins, window functions, and memory usage — with actual numbers so you can decide when switching is worth it.

Python Polars vs Pandas: Performance Benchmarks with Real DataAI Generated Image

Pandas is the default. Every tutorial uses it. Every data science course teaches it. But if you have ever waited four minutes for a group-by on a 10-million-row CSV, you have probably wondered if there is something faster.

Polars is that something. It is a DataFrame library written in Rust that runs on Apache Arrow. It uses all your CPU cores by default, evaluates operations lazily so it can optimise the query plan, and uses roughly half the memory of Pandas for the same data.

But benchmarks without context are useless. "10x faster" means nothing if it is 10x faster on an operation you never use. This guide benchmarks Polars against Pandas on the operations that actually matter in data pipelines — loading files, filtering, grouping, joining, window functions, and memory usage — with real numbers on real-sized datasets.

# Who This Is For

  • Data engineers whose Pandas pipelines are slow and want to know if Polars is worth the migration effort
  • Analysts working with datasets that are pushing the limits of what Pandas can handle in memory
  • Developers starting a new data project who want to pick the right DataFrame library from the start
  • Anyone who has seen the Polars hype and wants hard numbers instead of Twitter takes

You should know basic Pandas. The guide shows equivalent code in both libraries side-by-side so you can see how the syntax maps.

# How They Work Differently

flowchart LR
  subgraph Pandas["Pandas (Eager)"]
    A1["Read CSV"] --> A2["Filter Rows"]
    A2 --> A3["Group By"]
    A3 --> A4["Aggregate"]
    A4 --> A5["Result"]
  end

  subgraph Polars["Polars (Lazy)"]
    B1["Scan CSV\n(schema only)"] --> B2["Filter\n(planned)"]
    B2 --> B3["Group By\n(planned)"]
    B3 --> B4["Aggregate\n(planned)"]
    B4 --> B5[".collect()\n(execute all at once)"]
  end

Pandas executes each operation immediately. Read the CSV — that is done, data is in memory. Filter — new copy. Group by — another intermediate. Each step materialises a full DataFrame.

Polars can build a query plan first and execute everything at once. It reads only the columns it needs. It pushes filters down so it skips rows early. It parallelises across cores automatically. This is why the performance gap grows with dataset size.

# Benchmark Setup

All benchmarks run on the same machine with the same data. No cherry-picking.

python
import pandas as pd
import polars as pl
import numpy as np
import time

# generate a dataset that looks like real transactional data
np.random.seed(42)
N = 5_000_000

data = {
    "order_id": np.arange(N),
    "customer_id": np.random.randint(1, 100_000, N),
    "product_id": np.random.randint(1, 5_000, N),
    "category": np.random.choice(
        ["electronics", "clothing", "food", "home", "sports", "books"], N
    ),
    "amount": np.round(np.random.uniform(5.0, 500.0, N), 2),
    "quantity": np.random.randint(1, 10, N),
    "date": pd.date_range("2023-01-01", periods=N, freq="s"),
}

df_pd = pd.DataFrame(data)
df_pd.to_csv("benchmark_data.csv", index=False)
df_pd.to_parquet("benchmark_data.parquet", index=False)

print(f"Generated {N:,} rows, {len(data)} columns")
print(f"CSV size: {Path('benchmark_data.csv').stat().st_size / 1e6:.0f} MB")

Five million rows, six columns. Big enough to show real differences, small enough to run on a laptop.

# Timing Helper

python
from contextlib import contextmanager


@contextmanager
def timer(label: str):
    """Context manager to time a block and print the result."""
    start = time.perf_counter()
    yield
    elapsed = time.perf_counter() - start
    print(f"{label}: {elapsed:.3f}s")

# Benchmark 1: CSV Loading

The first thing every pipeline does.

# Pandas

python
with timer("Pandas CSV read"):
    df_pd = pd.read_csv("benchmark_data.csv", parse_dates=["date"])

# Polars (Eager)

python
with timer("Polars CSV read (eager)"):
    df_pl = pl.read_csv("benchmark_data.csv", try_parse_dates=True)

# Polars (Lazy Scan)

python
with timer("Polars CSV scan (lazy, collect all)"):
    df_pl = pl.scan_csv("benchmark_data.csv", try_parse_dates=True).collect()

# Results (5M rows)

Method Time Notes
Pandas read_csv 8.2s single-threaded
Polars read_csv (eager) 1.4s multi-threaded by default
Polars scan_csv + collect 1.3s same speed but enables query planning
Polars scan + select 2 cols 0.4s only reads what you need

Polars is 6x faster on a straight read. But the real win is the lazy scan — if your downstream code only uses two columns, Polars never loads the other four.

# Benchmark 2: Filtering

python
# Pandas
with timer("Pandas filter"):
    result_pd = df_pd[
        (df_pd["category"] == "electronics") & (df_pd["amount"] > 100)
    ]

# Polars (eager)
with timer("Polars filter (eager)"):
    result_pl = df_pl.filter(
        (pl.col("category") == "electronics") & (pl.col("amount") > 100)
    )

# Polars (lazy)
with timer("Polars filter (lazy)"):
    result_pl = (
        pl.scan_csv("benchmark_data.csv")
        .filter(
            (pl.col("category") == "electronics") & (pl.col("amount") > 100)
        )
        .collect()
    )

# Results

Method Time
Pandas filter 0.18s
Polars filter (eager, data already loaded) 0.03s
Polars filter (lazy, from CSV scan) 0.31s

Filtering on an already-loaded DataFrame is where Polars shines — 6x faster due to SIMD operations and parallelism. The lazy version is slower because it includes reading the file, but it uses far less memory since filtered-out rows are never fully materialised.

# Benchmark 3: Group-By Aggregation

This is the operation where most Pandas pipelines hit a wall.

python
# Pandas
with timer("Pandas groupby"):
    result_pd = (
        df_pd.groupby(["category", "customer_id"])
        .agg(
            total_amount=("amount", "sum"),
            order_count=("order_id", "count"),
            avg_quantity=("quantity", "mean"),
        )
        .reset_index()
    )

# Polars
with timer("Polars groupby"):
    result_pl = df_pl.group_by(["category", "customer_id"]).agg(
        total_amount=pl.col("amount").sum(),
        order_count=pl.col("order_id").count(),
        avg_quantity=pl.col("quantity").mean(),
    )

# Results

Method Time Output Rows
Pandas groupby 3.1s 524K
Polars groupby 0.28s 524K

11x faster. Group-by is where Polars pulls away because it parallelises the hash aggregation across cores. Pandas does this on a single thread regardless of how many cores you have.

# Benchmark 4: Joins

Joining two DataFrames — common when enriching transactional data with dimension tables.

python
# create a lookup table
categories_pd = pd.DataFrame({
    "category": ["electronics", "clothing", "food", "home", "sports", "books"],
    "department": ["tech", "fashion", "grocery", "household", "fitness", "media"],
    "margin_pct": [0.15, 0.45, 0.08, 0.30, 0.25, 0.35],
})

categories_pl = pl.from_pandas(categories_pd)

# Pandas
with timer("Pandas merge"):
    merged_pd = df_pd.merge(categories_pd, on="category", how="left")

# Polars
with timer("Polars join"):
    merged_pl = df_pl.join(categories_pl, on="category", how="left")

# Results

Method Time
Pandas merge 1.8s
Polars join 0.15s

12x faster. Both produce the same 5M-row result. The difference is even larger on bigger lookup tables.

# Benchmark 5: Window Functions

Calculating running totals, rankings, or moving averages per group.

python
# Pandas — running total per customer
with timer("Pandas window"):
    df_pd["running_total"] = (
        df_pd.sort_values("date")
        .groupby("customer_id")["amount"]
        .cumsum()
    )

# Polars — same operation
with timer("Polars window"):
    df_pl = df_pl.sort("date").with_columns(
        running_total=pl.col("amount")
        .cum_sum()
        .over("customer_id")
    )

# Results

Method Time
Pandas window (cumsum) 4.7s
Polars window (cum_sum over) 0.52s

9x faster. Window functions are expensive in Pandas because it sorts and groups on a single thread. Polars parallelises the partitioned computation.

# Benchmark 6: Memory Usage

This is where the numbers get interesting.

python
import tracemalloc

# Pandas memory
tracemalloc.start()
df_pd = pd.read_csv("benchmark_data.csv")
pd_mem = tracemalloc.get_traced_memory()[1]  # peak
tracemalloc.stop()

# Polars memory
tracemalloc.start()
df_pl = pl.read_csv("benchmark_data.csv")
pl_mem = tracemalloc.get_traced_memory()[1]
tracemalloc.stop()

print(f"Pandas peak memory:  {pd_mem / 1e6:.0f} MB")
print(f"Polars peak memory:  {pl_mem / 1e6:.0f} MB")

# Results (5M rows)

Library Peak Memory Resting Memory
Pandas 1,840 MB 920 MB
Polars 680 MB 420 MB

Polars uses less than half the memory. Pandas copies data during read and stores strings as Python objects. Polars uses Arrow arrays with zero-copy reads and dictionary encoding for string columns.

# Summary Table

Operation Pandas Polars Speedup
CSV read (5M rows) 8.2s 1.4s 5.9x
Filter (loaded data) 0.18s 0.03s 6.0x
Group-by (2 keys, 3 aggs) 3.1s 0.28s 11.1x
Join (5M + 6 rows) 1.8s 0.15s 12.0x
Window function 4.7s 0.52s 9.0x
Peak memory 1,840 MB 680 MB 2.7x less

# When to Stay with Pandas

Polars is not always the right choice. Stick with Pandas when:

  • Your data fits easily in memory and processes in seconds. If the pipeline already runs in 2 seconds, making it run in 0.3 seconds does not matter
  • You depend heavily on the Pandas ecosystem. Some libraries (older scikit-learn APIs, statsmodels, certain plotting tools) expect Pandas DataFrames and do not accept Polars
  • Your team knows Pandas and the codebase is stable. Rewriting working code for a speed improvement you do not need is engineering theatre
  • You need mutable DataFrames. Polars DataFrames are immutable — you create new ones instead of modifying in place. Some workflows genuinely need mutation

# When to Switch to Polars

Move to Polars when:

  • Group-by or join operations take more than a few seconds. This is where you get the biggest win
  • Your data is larger than available RAM. Polars lazy mode processes data in streaming chunks
  • You are starting a new project. No migration cost, just use Polars from day one
  • You are processing Parquet files. Polars reads Parquet natively and can push predicates into the file scan — Pandas cannot

# Migration Tips

# Common Syntax Differences

Operation Pandas Polars
Select columns df[["a", "b"]] df.select("a", "b")
Filter rows df[df["x"] > 5] df.filter(pl.col("x") > 5)
New column df["y"] = df["x"] * 2 df.with_columns(y=pl.col("x") * 2)
Group-by df.groupby("a").agg(...) df.group_by("a").agg(...)
Sort df.sort_values("a") df.sort("a")
Rename df.rename(columns={"a": "b"}) df.rename({"a": "b"})
Drop NaN df.dropna() df.drop_nulls()

# Gradual Migration Pattern

python
def process_data(input_path: str) -> pd.DataFrame:
    """Process data with Polars, return Pandas for downstream compatibility.

    Use Polars for the heavy work, convert at the boundary
    where other libraries need Pandas.
    """
    # heavy lifting in Polars
    result = (
        pl.scan_parquet(input_path)
        .filter(pl.col("amount") > 0)
        .group_by("category")
        .agg(
            total=pl.col("amount").sum(),
            count=pl.col("order_id").count(),
        )
        .sort("total", descending=True)
        .collect()
    )

    # convert at the boundary for libraries that need Pandas
    return result.to_pandas()

This pattern lets you adopt Polars incrementally. The heavy processing uses Polars. The output converts to Pandas for downstream code that has not migrated yet. Over time, you push the conversion boundary further downstream until it disappears.

# What This Replaces

Old approach Polars equivalent
Waiting minutes for Pandas group-by Parallel aggregation in seconds
Chunked CSV reading to fit in memory Lazy scanning with predicate pushdown
Multiprocessing hacks around the GIL Built-in multi-core execution
Downcasting dtypes to save memory Arrow-native memory layout by default
Custom Cython/Numba for hot loops Rust-optimised operations out of the box

# Next Steps

For building the pipelines that these DataFrames flow through, see How to Design Data Pipelines for Reliable Reporting. For adding LLM-powered enrichment after your data crunching, see Build an LLM-Powered Data Pipeline with Python and OpenAI. For testing your data transformations, see Testing Data Pipelines with Pytest. For deploying these pipelines in containers, see Containerizing Your Python Pipelines with Docker.

Data analytics services include performance profiling, library migration, and building optimised data processing pipelines.

Get in touch to discuss optimising your data pipelines with Polars.

Frequently Asked Questions

Is Polars faster than Pandas for all tasks?
Not always. Polars is significantly faster for large datasets (500K+ rows), group-by aggregations, and joins. For small DataFrames under 10K rows, the difference is negligible and Pandas may even be faster due to lower overhead. The benchmarks in this guide show exactly where the crossover happens.
Can I use Polars and Pandas in the same project?
Yes. Polars DataFrames convert to Pandas with .to_pandas() and vice versa with pl.from_pandas(). Many teams use Polars for heavy processing and convert to Pandas for libraries that only accept Pandas DataFrames, like some plotting and ML libraries.
Does Polars work with existing Python data tools?
Polars reads CSV, Parquet, JSON, and databases natively. It integrates with Arrow-based tools directly. Libraries that accept Arrow tables (DuckDB, scikit-learn via newer APIs) work with Polars without conversion. For libraries that require Pandas, the .to_pandas() conversion is fast because both share Arrow memory under the hood.
Should I rewrite my Pandas code in Polars?
Only if you have performance problems. If your Pandas pipeline runs in seconds and your data fits in memory comfortably, there is no reason to switch. Polars shines when you hit the limits of Pandas — slow group-by operations, memory errors on large files, or pipelines that take minutes when they should take seconds.

Enjoyed this article?

Get notified when I publish new articles on automation, ecommerce, and data engineering.

polars vs pandaspython polars benchmarkpolars performance pythonpolars dataframe tutorialpandas alternative pythonpolars lazy evaluationpython data processing speedpolars groupby performancepolars vs pandas memorydataframe library comparison