Shopify Reporting API: How to Pull Sales, Inventory, and Customer Data Automatically

08.05.2026·15 min read·Ecommerce

Learn how to automatically pull sales, inventory, and customer data from your Shopify store using Python and the Admin API. Covers authentication setup, handling pagination and rate limits, GraphQL bulk operations for large stores, and building a complete daily data pipeline — no manual CSV exports needed.

Shopify Reporting API: How to Pull Sales, Inventory, and Customer Data Automatically

AI Generated Image

Shopify stores generate data constantly — orders, inventory movements, customer signups, abandoned carts. But the built-in reports only show you what Shopify decides to surface. If you need custom aggregations, cross-store comparisons, or automated data feeds, you need the API.

Most store owners hit this wall eventually: you want to know which products actually drive profit (not just revenue), which customers keep coming back, or where your inventory will run out next week. Shopify's dashboards give you a surface-level view, but they can't answer specific questions about your business without hours of manual export and spreadsheet work.

The Shopify Admin API solves this by letting you pull raw data directly into a Python script — automatically, on a schedule, with no manual clicking. You get every field Shopify stores about your orders, products, and customers, and you can transform it into exactly the reports and alerts you need.

This guide covers how to pull the three most valuable data types — sales, inventory, and customers — using both the REST Admin API and GraphQL, with proper authentication, pagination, and rate limit handling.

Note: Shopify marked the REST Admin API as legacy in October 2024. New public apps must use GraphQL exclusively. However, REST still works for custom apps and existing integrations — and it remains simpler for straightforward data pulls. This guide covers both approaches.

# Who This Is For

Store owners tired of exporting CSVs from Shopify admin every week
Operations teams who need automated inventory alerts before stock runs out
Analysts building dashboards that combine Shopify data with ad spend, warehouse data, or accounting systems
Developers building internal tools or client reporting for ecommerce brands

You don't need to be a software engineer — but you do need basic Python knowledge (variables, functions, running scripts). If you can run a Python file from your terminal, you can follow this guide.

# What You'll Need

Before starting, make sure you have:

A Shopify store with admin access (you need to create a custom app)
Python 3.9+ installed on your computer or server
Three Python packages: requests (makes API calls), pandas (organises data into tables), and optionally pyarrow (saves data efficiently)

Install them with:

bash

pip install requests pandas pyarrow

# What the Shopify API Gives You

flowchart TD
  SA[Shopify Admin API] --> REST[REST API]
  SA --> GQL[GraphQL API]

  REST --> O[Orders]
  REST --> P[Products]
  REST --> C[Customers]
  REST --> I[Inventory]

  GQL --> BO[Bulk Operations]
  GQL --> RT[Real-time Queries]

  BO --> CSV[JSONL Export]
  RT --> D[Direct Response]

The REST API is simpler — one endpoint per resource, paginated responses. Think of it like requesting pages from a catalogue: you ask for page 1 of orders, get 250 results, then ask for the next page until there are no more.

The GraphQL API is more powerful — request exactly the fields you need, and use bulk operations for large exports without hitting rate limits. It's like placing a custom order: you tell Shopify exactly what data you want, and it packages everything into a single downloadable file.

# Authentication Setup

Shopify uses access tokens scoped to specific permissions. Think of this like a keycard for a building — each token only opens the doors you've specifically granted access to. This means your script can read order data without being able to delete products or change prices.

You need a custom app in your Shopify admin. This takes about 5 minutes to set up.

# Creating a Custom App

Go to Settings → Apps and sales channels → Develop apps
Create a new app
Configure Admin API scopes:
- read_orders — order and transaction data (last 60 days)
- read_all_orders — full order history (requires approval for public apps)
- read_products — product and variant data
- read_customers — customer profiles
- read_inventory — stock levels
Install the app and copy the Admin API access token

# Python Client Setup

Below is a reusable Python class that handles the repetitive parts of talking to the Shopify API — authentication headers, pagination (automatically fetching all pages of results), and rate limiting (slowing down before Shopify cuts you off). You'll use this client in every data pull that follows.

python

import requests
import time
from datetime import datetime, timedelta

class ShopifyAPI:
    """Production-ready Shopify Admin API client with rate limiting."""

    def __init__(self, shop_domain, access_token, api_version="2026-01"):
        # Build the base URL: e.g. https://my-store.myshopify.com/admin/api/2026-01
        self.base_url = f"https://{shop_domain}/admin/api/{api_version}"
        self.headers = {
            "X-Shopify-Access-Token": access_token,  # Your app's secret token
            "Content-Type": "application/json",
        }
        self._calls_remaining = 40  # Shopify allows 40 calls before throttling

    def _handle_rate_limit(self, response):
        """Track API call budget from response headers."""
        # Shopify returns "used/total" e.g. "32/40" meaning 8 calls left
        limit_header = response.headers.get("X-Shopify-Shop-Api-Call-Limit", "")
        if limit_header:
            used, total = limit_header.split("/")
            self._calls_remaining = int(total) - int(used)

        # Slow down when running low on calls to avoid hitting 429 errors
        if self._calls_remaining < 5:
            time.sleep(1.0)

    def get(self, endpoint, params=None):
        """GET request with automatic pagination and rate limit handling."""
        all_results = []
        url = f"{self.base_url}/{endpoint}.json"

        while url:
            response = requests.get(url, headers=self.headers, params=params, timeout=30)

            # If rate limited, wait the time Shopify tells us then retry
            if response.status_code == 429:
                retry_after = float(response.headers.get("Retry-After", 2))
                time.sleep(retry_after)
                continue

            response.raise_for_status()
            self._handle_rate_limit(response)

            data = response.json()
            # Extract the resource list from the response (e.g. "orders", "products")
            resource = endpoint.split("/")[-1]
            all_results.extend(data.get(resource, []))

            # Check if there are more pages of results
            url = None
            link_header = response.headers.get("Link", "")
            if 'rel="next"' in link_header:
                for part in link_header.split(","):
                    if 'rel="next"' in part:
                        url = part.split(";")[0].strip(" <>")
            params = None  # Params only needed on first request; pagination URL includes them

        return all_results

# Pulling Sales Data

Orders are the core of Shopify reporting. The API exposes far more detail than the admin dashboard — you get individual line items, discount breakdowns, shipping costs, refund amounts, and timestamps precise to the second.

This is where most businesses start, because order data directly answers questions like:

What was our actual revenue yesterday (after refunds and discounts)?
Which products are selling fastest this week?
What's our average order value trend over the last 3 months?

# Fetch Orders with Full Detail

This function pulls orders from a configurable time window and organises them into a pandas DataFrame (essentially a spreadsheet in Python — rows and columns you can filter, sort, and aggregate).

python

import pandas as pd

def pull_orders(client, days_back=30):
    """Pull orders from the last N days into a clean DataFrame."""
    # Calculate the start date (e.g. 30 days ago) in ISO format
    since = (datetime.now() - timedelta(days=days_back)).isoformat()

    raw_orders = client.get("orders", params={
        "created_at_min": since,       # Only orders after this date
        "status": "any",               # Include open, closed, and cancelled
        "limit": 250,                  # Max results per page (Shopify's maximum)
        "fields": "id,order_number,created_at,processed_at,total_price,"
                  "subtotal_price,total_discounts,total_tax,financial_status,"
                  "fulfillment_status,line_items,customer,currency,"
                  "shipping_lines,refunds",  # Only fetch fields we need (faster)
    })

    records = []
    for order in raw_orders:
        # Extract the fields we care about into a flat structure
        record = {
            "order_id": order["id"],
            "order_number": order["order_number"],
            "created_at": order["created_at"],
            "processed_at": order.get("processed_at"),  # When payment was captured
            "total_price": float(order["total_price"]),
            "subtotal_price": float(order.get("subtotal_price", 0)),
            "total_discounts": float(order.get("total_discounts", 0)),
            "total_tax": float(order.get("total_tax", 0)),
            "financial_status": order["financial_status"],  # paid, pending, refunded
            "fulfillment_status": order.get("fulfillment_status") or "unfulfilled",
            "item_count": len(order.get("line_items", [])),
            "customer_id": order.get("customer", {}).get("id"),
            "currency": order["currency"],
            # Sum all shipping line costs for this order
            "shipping_cost": sum(
                float(s.get("price", 0)) for s in order.get("shipping_lines", [])
            ),
            # Sum refund amounts from all refund transactions
            "refund_amount": sum(
                float(r.get("transactions", [{}])[0].get("amount", 0))
                for r in order.get("refunds", [])
                if r.get("transactions")
            ),
        }
        records.append(record)

    # Convert to a DataFrame and parse date columns
    df = pd.DataFrame(records)
    df["created_at"] = pd.to_datetime(df["created_at"], utc=True)
    df["processed_at"] = pd.to_datetime(df["processed_at"], utc=True)
    df["date"] = df["processed_at"].dt.date  # Simple date for daily grouping

    print(f"Pulled {len(df)} orders from last {days_back} days")
    return df

# Line-Item Level Data

For product-level reporting (top sellers, margin analysis), you need to look inside each order at the individual items purchased. The function below "unrolls" each order into its component products, so you can answer questions like "which SKU generated the most revenue?" or "which products get the biggest discounts?"

python

def pull_line_items(client, days_back=30):
    """Pull order line items for product-level analysis."""
    since = (datetime.now() - timedelta(days=days_back)).isoformat()

    raw_orders = client.get("orders", params={
        "created_at_min": since,
        "status": "any",
        "limit": 250,
    })

    items = []
    for order in raw_orders:
        # Loop through each product in this order
        for item in order.get("line_items", []):
            items.append({
                "order_id": order["id"],
                "order_date": order["created_at"],
                "product_id": item.get("product_id"),
                "variant_id": item.get("variant_id"),
                "title": item["title"],
                "variant_title": item.get("variant_title", ""),  # e.g. "Size: Large"
                "sku": item.get("sku", ""),
                "quantity": item["quantity"],
                "price": float(item["price"]),  # Price per unit
                "total": float(item["price"]) * item["quantity"],  # Revenue from this line
                # Total discount applied to this specific item
                "discount": sum(
                    float(d.get("amount", 0))
                    for d in item.get("discount_allocations", [])
                ),
            })

    df = pd.DataFrame(items)
    df["order_date"] = pd.to_datetime(df["order_date"], utc=True)
    print(f"Pulled {len(df)} line items from {len(raw_orders)} orders")
    return df

# Pulling Inventory Data

Inventory in Shopify is tied to locations. A single product variant can have stock at multiple warehouses or retail stores. This is important for reporting because a product might show "10 in stock" overall, but that could be 8 in London and 2 in Manchester — meaning your Manchester store is about to run out.

# Current Stock Levels

This function fetches every active product variant and its stock count at each location. The result is a single table showing what you have, where you have it, and what's running low.

python

def pull_inventory(client):
    """Pull current inventory for all variants with location data."""
    # Get all warehouse/store locations for this Shopify account
    locations = client.get("locations")
    location_map = {loc["id"]: loc["name"] for loc in locations}

    # Get all products (we need their variant IDs to look up stock levels)
    products = client.get("products", params={
        "limit": 250,
        "fields": "id,title,variants,status",
    })

    inventory = []
    for product in products:
        # Skip draft or archived products
        if product.get("status") != "active":
            continue

        for variant in product.get("variants", []):
            inventory_item_id = variant.get("inventory_item_id")
            if not inventory_item_id:
                continue

            # Each variant can have stock at multiple locations
            levels = client.get(
                "inventory_levels",
                params={"inventory_item_ids": inventory_item_id}
            )

            for level in levels:
                inventory.append({
                    "product_id": product["id"],
                    "product_title": product["title"],
                    "variant_id": variant["id"],
                    "variant_title": variant.get("title", "Default"),
                    "sku": variant.get("sku", ""),
                    "price": float(variant.get("price", 0)),
                    "inventory_item_id": inventory_item_id,
                    "location_id": level["location_id"],
                    "location_name": location_map.get(level["location_id"], "Unknown"),
                    "available": level.get("available", 0),  # Units in stock
                })

    df = pd.DataFrame(inventory)
    print(f"Pulled inventory for {df['variant_id'].nunique()} variants "
          f"across {df['location_id'].nunique()} locations")
    return df

# Low Stock Alerts

Once you have inventory data, you can instantly identify products about to sell out. The function below flags any variant with 5 or fewer units remaining (you can adjust the threshold). This replaces manually scanning Shopify's inventory page every morning.

python

def identify_low_stock(inventory_df, threshold=5):
    """Find variants running low across all locations."""
    # Sum stock across all locations for each variant
    stock_totals = (
        inventory_df.groupby(["product_title", "sku", "variant_title", "price"])
        .agg(total_available=("available", "sum"))
        .reset_index()
    )

    # Variants at or below the threshold
    low = stock_totals[stock_totals["total_available"] <= threshold]
    # Completely sold out
    out = stock_totals[stock_totals["total_available"] == 0]

    print(f"Low stock ({threshold} or fewer): {len(low)} variants")
    print(f"Out of stock: {len(out)} variants")
    print(f"At-risk revenue: {(out['price'] * 1).sum():,.2f} (1 unit each)")

    return low.sort_values("total_available")

# Pulling Customer Data

Customer data enables segmentation, repeat purchase analysis, and lifetime value calculations. This is the data that answers: "Who are my best customers?", "How much does an average customer spend over their lifetime?", and "Which customers haven't ordered in 90 days?"

The function below pulls customer profiles along with their order totals, so you can immediately segment by value.

python

def pull_customers(client, days_back=90):
    """Pull customer profiles with order history summary."""
    since = (datetime.now() - timedelta(days=days_back)).isoformat()

    raw_customers = client.get("customers", params={
        "created_at_min": since,
        "limit": 250,
    })

    records = []
    for cust in raw_customers:
        records.append({
            "customer_id": cust["id"],
            "email": cust.get("email"),
            "first_name": cust.get("first_name", ""),
            "last_name": cust.get("last_name", ""),
            "created_at": cust["created_at"],
            "orders_count": cust.get("orders_count", 0),  # Lifetime order count
            "total_spent": float(cust.get("total_spent", 0)),  # Lifetime spend
            "currency": cust.get("currency", "GBP"),
            "tags": cust.get("tags", ""),  # Custom tags you've assigned in Shopify
            "accepts_marketing": cust.get("accepts_marketing", False),
            "city": cust.get("default_address", {}).get("city", ""),
            "country": cust.get("default_address", {}).get("country", ""),
        })

    df = pd.DataFrame(records)
    df["created_at"] = pd.to_datetime(df["created_at"], utc=True)
    # Calculate average order value (total spent divided by number of orders)
    df["avg_order_value"] = df.apply(
        lambda r: r["total_spent"] / r["orders_count"] if r["orders_count"] > 0 else 0,
        axis=1,
    )
    print(f"Pulled {len(df)} customers from last {days_back} days")
    return df

# GraphQL Bulk Operations for Large Stores

The REST API works well for stores with under 10,000 orders per month. But if you're pulling hundreds of thousands of orders, it becomes painfully slow — each page takes a separate API call, and you're limited to 2 calls per second.

GraphQL bulk operations solve this by running your query in the background on Shopify's servers. Instead of pulling data page by page, you submit a query, wait a few minutes, and download a single file containing everything. It's like the difference between photocopying a book page by page versus having someone print you a fresh copy.

# Running a Bulk Export

The code below starts a bulk export, polls Shopify every 5 seconds to check if it's ready, and then downloads the results. The output is a JSONL file (one JSON object per line) which you can parse into a DataFrame.

python

import json

BULK_ORDERS_QUERY = """
mutation {
  bulkOperationRunQuery(
    query: \"\"\"
    {
      orders(query: "created_at:>2026-04-01") {
        edges {
          node {
            id
            name
            createdAt
            totalPriceSet { shopMoney { amount currencyCode } }
            subtotalPriceSet { shopMoney { amount } }
            totalDiscountsSet { shopMoney { amount } }
            displayFinancialStatus
            displayFulfillmentStatus
            lineItems(first: 50) {
              edges {
                node {
                  title
                  quantity
                  originalUnitPriceSet { shopMoney { amount } }
                  sku
                }
              }
            }
          }
        }
      }
    }
    \"\"\"
  ) {
    bulkOperation {
      id
      status
    }
    userErrors {
      field
      message
    }
  }
}
"""


def run_bulk_export(shop_domain, access_token, api_version="2026-01"):
    """Start a bulk operation and poll until complete."""
    url = f"https://{shop_domain}/admin/api/{api_version}/graphql.json"
    headers = {
        "X-Shopify-Access-Token": access_token,
        "Content-Type": "application/json",
    }

    # Submit the bulk query to Shopify's background processing
    response = requests.post(url, json={"query": BULK_ORDERS_QUERY}, headers=headers, timeout=30)
    response.raise_for_status()
    result = response.json()

    # Check for validation errors in the query
    errors = result.get("data", {}).get("bulkOperationRunQuery", {}).get("userErrors", [])
    if errors:
        raise Exception(f"Bulk operation errors: {errors}")

    op_id = result["data"]["bulkOperationRunQuery"]["bulkOperation"]["id"]
    print(f"Bulk operation started: {op_id}")

    # Poll every 5 seconds until Shopify finishes processing
    poll_query = """
    query {
      currentBulkOperation {
        id
        status
        errorCode
        objectCount
        url
      }
    }
    """

    while True:
        time.sleep(5)  # Wait between status checks
        poll_response = requests.post(url, json={"query": poll_query}, headers=headers, timeout=30)
        poll_response.raise_for_status()
        op = poll_response.json()["data"]["currentBulkOperation"]

        if op["status"] == "COMPLETED":
            print(f"Complete: {op['objectCount']} objects exported")
            return op["url"]  # Download URL for the results file
        elif op["status"] in ("FAILED", "CANCELED"):
            raise Exception(f"Bulk operation {op['status']}: {op.get('errorCode')}")

        print(f"  Status: {op['status']} ({op.get('objectCount', 0)} objects)")


def download_bulk_results(download_url):
    """Download and parse JSONL bulk operation results."""
    response = requests.get(download_url, timeout=60)
    response.raise_for_status()

    # Each line is a separate JSON object (one per order/line item)
    records = []
    for line in response.text.strip().split("\n"):
        records.append(json.loads(line))

    print(f"Downloaded {len(records)} records from bulk export")
    return records

# When to Use Bulk Operations

Here's a quick decision guide based on your store size:

Scenario	REST API	GraphQL Bulk
< 1,000 orders/month	✓
1,000–10,000 orders/month	✓ (slower)	✓
> 10,000 orders/month	Too slow	✓
Real-time single-order lookup	✓
Full catalogue export	Slow	✓
Incremental daily pulls	✓

# Rate Limits and Best Practices

Shopify limits how fast you can make API calls to protect their servers (and all the other stores sharing them). If you exceed the limit, Shopify temporarily blocks your requests with a 429 "Too Many Requests" error. Understanding these limits prevents your scripts from randomly failing halfway through.

# REST API Limits

Shopify uses a leaky bucket algorithm — imagine a bucket that holds 40 marbles and has a hole that lets 2 marbles out per second. You can throw marbles in as fast as you want (up to 40 at once), but once the bucket is full, you have to wait for marbles to leak out before adding more.

Bucket size: 40 calls
Leak rate: 2 calls/second
Burst: Up to 40 calls without waiting

python

def safe_batch_fetch(client, endpoint, ids, batch_size=50):
    """Fetch resources in batches to stay within rate limits."""
    all_results = []

    # Process IDs in chunks to avoid overloading the API
    for i in range(0, len(ids), batch_size):
        batch = ids[i:i + batch_size]
        results = client.get(endpoint, params={
            "ids": ",".join(str(id) for id in batch),
            "limit": batch_size,
        })
        all_results.extend(results)

        # Pause if we're running low on API calls
        if client._calls_remaining < 10:
            time.sleep(2)

    return all_results

# Common Pitfalls

These are mistakes that trip up almost everyone building Shopify reporting for the first time:

Not using processed_at for revenue timing — created_at is when the customer clicked "Buy"; processed_at is when payment was actually captured. For financial reporting, you almost always want processed_at — it's when the money actually arrived.
Ignoring test orders — Filter by test: false in production reporting.
Forgetting timezone conversion — Shopify returns all timestamps in UTC (London winter time). If your store is in New York, an order placed at 11pm local time shows up as the next day in UTC. Always convert to your store's timezone before grouping by date.
Not accessing older orders — Only the last 60 days of orders are accessible by default. You need the read_all_orders scope (request it during app approval) to access the full history. status: any includes canceled/closed orders but does not bypass the 60-day limit.
Exceeding API version lifecycle — Shopify releases new API versions quarterly and deprecates old ones after roughly 12 months. Pin to a version (e.g. 2026-01) and update during maintenance windows. Check the developer changelog for breaking changes.

# Putting It Together: Automated Data Pipeline

Here's the complete pipeline that ties everything together. This single script pulls all four data types, saves them as Parquet files (a compact, fast file format that's better than CSV for data analysis), and checks for low stock alerts. Run this daily with a scheduler like cron or Prefect, and you'll never manually export from Shopify again.

python

import os
from pathlib import Path

def run_daily_data_pull():
    """Complete daily data extraction pipeline."""
    # Load credentials from environment variables (never hard-code secrets)
    client = ShopifyAPI(
        shop_domain=os.environ["SHOPIFY_STORE"],
        access_token=os.environ["SHOPIFY_API_TOKEN"],
    )

    # Pull yesterday's data for each resource type
    orders = pull_orders(client, days_back=1)
    line_items = pull_line_items(client, days_back=1)
    inventory = pull_inventory(client)  # Always pulls current snapshot
    customers = pull_customers(client, days_back=1)

    # Save each dataset to a date-stamped folder as Parquet files
    output_dir = Path("data") / datetime.now().strftime("%Y-%m-%d")
    output_dir.mkdir(parents=True, exist_ok=True)

    orders.to_parquet(output_dir / "orders.parquet", index=False)
    line_items.to_parquet(output_dir / "line_items.parquet", index=False)
    inventory.to_parquet(output_dir / "inventory.parquet", index=False)
    customers.to_parquet(output_dir / "customers.parquet", index=False)

    # Check for products that need restocking
    low_stock = identify_low_stock(inventory)
    if not low_stock.empty:
        print(f"\nWARNING: {len(low_stock)} variants at low stock - send alert")

    print(f"\nDaily pull complete: {output_dir}")
    return {
        "orders": len(orders),
        "line_items": len(line_items),
        "inventory_variants": len(inventory),
        "customers": len(customers),
    }


if __name__ == "__main__":
    run_daily_data_pull()

text

Pulled 47 orders from last 1 days
Pulled 89 line items from 47 orders
Pulled inventory for 342 variants across 2 locations
Low stock (5 or fewer): 12 variants
Out of stock: 3 variants
Pulled 18 customers from last 1 days

Daily pull complete: data/2026-05-08

# Next Steps

This covers the data extraction layer — the foundation of any reporting system. Once you have clean, automated data pulls running daily, you've eliminated the manual export step entirely. From here, you can build layers on top:

Automate the reports — turn raw data into formatted Excel reports or live dashboards that stakeholders can read without touching Python
Schedule the pipeline — run daily via cron or Prefect so data is always fresh without manual intervention
Store historically — feed into PostgreSQL or a data warehouse for long-term trend analysis (compare this December to last December)
Add alerting — trigger Slack or email notifications when KPIs drop below thresholds (revenue down 20%, conversion rate dropped, stock running out)

The key insight is that the API gives you all the data. The limitation was never access — it was automation. With these scripts running on a schedule, your reporting goes from "someone exports a CSV on Monday morning" to "data arrives automatically, reports update themselves, alerts fire before problems become crises."

For the full reporting pipeline that turns this data into automated Shopify reports, see How to Automate Shopify Reports with Python and the Shopify API.

Ecommerce optimization services include building custom Shopify API reporting pipelines tailored to your store's metrics.

Get in touch to discuss automating your Shopify data extraction.

Enjoyed this article?

Get notified when I publish new articles on automation, ecommerce, and data engineering.

shopify reporting apishopify admin api pythonshopify api pull sales datashopify inventory apishopify customer data apishopify graphql apishopify api rate limitsshopify bulk operationsecommerce data extractionshopify api authenticationshopify order data pythonautomate shopify data pull