Shopify Reporting API: How to Pull Sales, Inventory, and Customer Data Automatically
Learn how to automatically pull sales, inventory, and customer data from your Shopify store using Python and the Admin API. Covers authentication setup, handling pagination and rate limits, GraphQL bulk operations for large stores, and building a complete daily data pipeline — no manual CSV exports needed.
AI Generated ImageShopify stores generate data constantly — orders, inventory movements, customer signups, abandoned carts. But the built-in reports only show you what Shopify decides to surface. If you need custom aggregations, cross-store comparisons, or automated data feeds, you need the API.
Most store owners hit this wall eventually: you want to know which products actually drive profit (not just revenue), which customers keep coming back, or where your inventory will run out next week. Shopify's dashboards give you a surface-level view, but they can't answer specific questions about your business without hours of manual export and spreadsheet work.
The Shopify Admin API solves this by letting you pull raw data directly into a Python script — automatically, on a schedule, with no manual clicking. You get every field Shopify stores about your orders, products, and customers, and you can transform it into exactly the reports and alerts you need.
This guide covers how to pull the three most valuable data types — sales, inventory, and customers — using both the REST Admin API and GraphQL, with proper authentication, pagination, and rate limit handling.
Note: Shopify marked the REST Admin API as legacy in October 2024. New public apps must use GraphQL exclusively. However, REST still works for custom apps and existing integrations — and it remains simpler for straightforward data pulls. This guide covers both approaches.
# Who This Is For
- Store owners tired of exporting CSVs from Shopify admin every week
- Operations teams who need automated inventory alerts before stock runs out
- Analysts building dashboards that combine Shopify data with ad spend, warehouse data, or accounting systems
- Developers building internal tools or client reporting for ecommerce brands
You don't need to be a software engineer — but you do need basic Python knowledge (variables, functions, running scripts). If you can run a Python file from your terminal, you can follow this guide.
# What You'll Need
Before starting, make sure you have:
- A Shopify store with admin access (you need to create a custom app)
- Python 3.9+ installed on your computer or server
- Three Python packages:
requests(makes API calls),pandas(organises data into tables), and optionallypyarrow(saves data efficiently)
Install them with:
pip install requests pandas pyarrow
# What the Shopify API Gives You
flowchart TD SA[Shopify Admin API] --> REST[REST API] SA --> GQL[GraphQL API] REST --> O[Orders] REST --> P[Products] REST --> C[Customers] REST --> I[Inventory] GQL --> BO[Bulk Operations] GQL --> RT[Real-time Queries] BO --> CSV[JSONL Export] RT --> D[Direct Response]
The REST API is simpler — one endpoint per resource, paginated responses. Think of it like requesting pages from a catalogue: you ask for page 1 of orders, get 250 results, then ask for the next page until there are no more.
The GraphQL API is more powerful — request exactly the fields you need, and use bulk operations for large exports without hitting rate limits. It's like placing a custom order: you tell Shopify exactly what data you want, and it packages everything into a single downloadable file.
# Authentication Setup
Shopify uses access tokens scoped to specific permissions. Think of this like a keycard for a building — each token only opens the doors you've specifically granted access to. This means your script can read order data without being able to delete products or change prices.
You need a custom app in your Shopify admin. This takes about 5 minutes to set up.
# Creating a Custom App
- Go to Settings → Apps and sales channels → Develop apps
- Create a new app
- Configure Admin API scopes:
read_orders— order and transaction data (last 60 days)read_all_orders— full order history (requires approval for public apps)read_products— product and variant dataread_customers— customer profilesread_inventory— stock levels
- Install the app and copy the Admin API access token
# Python Client Setup
Below is a reusable Python class that handles the repetitive parts of talking to the Shopify API — authentication headers, pagination (automatically fetching all pages of results), and rate limiting (slowing down before Shopify cuts you off). You'll use this client in every data pull that follows.
import requests
import time
from datetime import datetime, timedelta
class ShopifyAPI:
"""Production-ready Shopify Admin API client with rate limiting."""
def __init__(self, shop_domain, access_token, api_version="2026-01"):
# Build the base URL: e.g. https://my-store.myshopify.com/admin/api/2026-01
self.base_url = f"https://{shop_domain}/admin/api/{api_version}"
self.headers = {
"X-Shopify-Access-Token": access_token, # Your app's secret token
"Content-Type": "application/json",
}
self._calls_remaining = 40 # Shopify allows 40 calls before throttling
def _handle_rate_limit(self, response):
"""Track API call budget from response headers."""
# Shopify returns "used/total" e.g. "32/40" meaning 8 calls left
limit_header = response.headers.get("X-Shopify-Shop-Api-Call-Limit", "")
if limit_header:
used, total = limit_header.split("/")
self._calls_remaining = int(total) - int(used)
# Slow down when running low on calls to avoid hitting 429 errors
if self._calls_remaining < 5:
time.sleep(1.0)
def get(self, endpoint, params=None):
"""GET request with automatic pagination and rate limit handling."""
all_results = []
url = f"{self.base_url}/{endpoint}.json"
while url:
response = requests.get(url, headers=self.headers, params=params, timeout=30)
# If rate limited, wait the time Shopify tells us then retry
if response.status_code == 429:
retry_after = float(response.headers.get("Retry-After", 2))
time.sleep(retry_after)
continue
response.raise_for_status()
self._handle_rate_limit(response)
data = response.json()
# Extract the resource list from the response (e.g. "orders", "products")
resource = endpoint.split("/")[-1]
all_results.extend(data.get(resource, []))
# Check if there are more pages of results
url = None
link_header = response.headers.get("Link", "")
if 'rel="next"' in link_header:
for part in link_header.split(","):
if 'rel="next"' in part:
url = part.split(";")[0].strip(" <>")
params = None # Params only needed on first request; pagination URL includes them
return all_results
# Pulling Sales Data
Orders are the core of Shopify reporting. The API exposes far more detail than the admin dashboard — you get individual line items, discount breakdowns, shipping costs, refund amounts, and timestamps precise to the second.
This is where most businesses start, because order data directly answers questions like:
- What was our actual revenue yesterday (after refunds and discounts)?
- Which products are selling fastest this week?
- What's our average order value trend over the last 3 months?
# Fetch Orders with Full Detail
This function pulls orders from a configurable time window and organises them into a pandas DataFrame (essentially a spreadsheet in Python — rows and columns you can filter, sort, and aggregate).
import pandas as pd
def pull_orders(client, days_back=30):
"""Pull orders from the last N days into a clean DataFrame."""
# Calculate the start date (e.g. 30 days ago) in ISO format
since = (datetime.now() - timedelta(days=days_back)).isoformat()
raw_orders = client.get("orders", params={
"created_at_min": since, # Only orders after this date
"status": "any", # Include open, closed, and cancelled
"limit": 250, # Max results per page (Shopify's maximum)
"fields": "id,order_number,created_at,processed_at,total_price,"
"subtotal_price,total_discounts,total_tax,financial_status,"
"fulfillment_status,line_items,customer,currency,"
"shipping_lines,refunds", # Only fetch fields we need (faster)
})
records = []
for order in raw_orders:
# Extract the fields we care about into a flat structure
record = {
"order_id": order["id"],
"order_number": order["order_number"],
"created_at": order["created_at"],
"processed_at": order.get("processed_at"), # When payment was captured
"total_price": float(order["total_price"]),
"subtotal_price": float(order.get("subtotal_price", 0)),
"total_discounts": float(order.get("total_discounts", 0)),
"total_tax": float(order.get("total_tax", 0)),
"financial_status": order["financial_status"], # paid, pending, refunded
"fulfillment_status": order.get("fulfillment_status") or "unfulfilled",
"item_count": len(order.get("line_items", [])),
"customer_id": order.get("customer", {}).get("id"),
"currency": order["currency"],
# Sum all shipping line costs for this order
"shipping_cost": sum(
float(s.get("price", 0)) for s in order.get("shipping_lines", [])
),
# Sum refund amounts from all refund transactions
"refund_amount": sum(
float(r.get("transactions", [{}])[0].get("amount", 0))
for r in order.get("refunds", [])
if r.get("transactions")
),
}
records.append(record)
# Convert to a DataFrame and parse date columns
df = pd.DataFrame(records)
df["created_at"] = pd.to_datetime(df["created_at"], utc=True)
df["processed_at"] = pd.to_datetime(df["processed_at"], utc=True)
df["date"] = df["processed_at"].dt.date # Simple date for daily grouping
print(f"Pulled {len(df)} orders from last {days_back} days")
return df
# Line-Item Level Data
For product-level reporting (top sellers, margin analysis), you need to look inside each order at the individual items purchased. The function below "unrolls" each order into its component products, so you can answer questions like "which SKU generated the most revenue?" or "which products get the biggest discounts?"
def pull_line_items(client, days_back=30):
"""Pull order line items for product-level analysis."""
since = (datetime.now() - timedelta(days=days_back)).isoformat()
raw_orders = client.get("orders", params={
"created_at_min": since,
"status": "any",
"limit": 250,
})
items = []
for order in raw_orders:
# Loop through each product in this order
for item in order.get("line_items", []):
items.append({
"order_id": order["id"],
"order_date": order["created_at"],
"product_id": item.get("product_id"),
"variant_id": item.get("variant_id"),
"title": item["title"],
"variant_title": item.get("variant_title", ""), # e.g. "Size: Large"
"sku": item.get("sku", ""),
"quantity": item["quantity"],
"price": float(item["price"]), # Price per unit
"total": float(item["price"]) * item["quantity"], # Revenue from this line
# Total discount applied to this specific item
"discount": sum(
float(d.get("amount", 0))
for d in item.get("discount_allocations", [])
),
})
df = pd.DataFrame(items)
df["order_date"] = pd.to_datetime(df["order_date"], utc=True)
print(f"Pulled {len(df)} line items from {len(raw_orders)} orders")
return df
# Pulling Inventory Data
Inventory in Shopify is tied to locations. A single product variant can have stock at multiple warehouses or retail stores. This is important for reporting because a product might show "10 in stock" overall, but that could be 8 in London and 2 in Manchester — meaning your Manchester store is about to run out.
# Current Stock Levels
This function fetches every active product variant and its stock count at each location. The result is a single table showing what you have, where you have it, and what's running low.
def pull_inventory(client):
"""Pull current inventory for all variants with location data."""
# Get all warehouse/store locations for this Shopify account
locations = client.get("locations")
location_map = {loc["id"]: loc["name"] for loc in locations}
# Get all products (we need their variant IDs to look up stock levels)
products = client.get("products", params={
"limit": 250,
"fields": "id,title,variants,status",
})
inventory = []
for product in products:
# Skip draft or archived products
if product.get("status") != "active":
continue
for variant in product.get("variants", []):
inventory_item_id = variant.get("inventory_item_id")
if not inventory_item_id:
continue
# Each variant can have stock at multiple locations
levels = client.get(
"inventory_levels",
params={"inventory_item_ids": inventory_item_id}
)
for level in levels:
inventory.append({
"product_id": product["id"],
"product_title": product["title"],
"variant_id": variant["id"],
"variant_title": variant.get("title", "Default"),
"sku": variant.get("sku", ""),
"price": float(variant.get("price", 0)),
"inventory_item_id": inventory_item_id,
"location_id": level["location_id"],
"location_name": location_map.get(level["location_id"], "Unknown"),
"available": level.get("available", 0), # Units in stock
})
df = pd.DataFrame(inventory)
print(f"Pulled inventory for {df['variant_id'].nunique()} variants "
f"across {df['location_id'].nunique()} locations")
return df
# Low Stock Alerts
Once you have inventory data, you can instantly identify products about to sell out. The function below flags any variant with 5 or fewer units remaining (you can adjust the threshold). This replaces manually scanning Shopify's inventory page every morning.
def identify_low_stock(inventory_df, threshold=5):
"""Find variants running low across all locations."""
# Sum stock across all locations for each variant
stock_totals = (
inventory_df.groupby(["product_title", "sku", "variant_title", "price"])
.agg(total_available=("available", "sum"))
.reset_index()
)
# Variants at or below the threshold
low = stock_totals[stock_totals["total_available"] <= threshold]
# Completely sold out
out = stock_totals[stock_totals["total_available"] == 0]
print(f"Low stock ({threshold} or fewer): {len(low)} variants")
print(f"Out of stock: {len(out)} variants")
print(f"At-risk revenue: {(out['price'] * 1).sum():,.2f} (1 unit each)")
return low.sort_values("total_available")
# Pulling Customer Data
Customer data enables segmentation, repeat purchase analysis, and lifetime value calculations. This is the data that answers: "Who are my best customers?", "How much does an average customer spend over their lifetime?", and "Which customers haven't ordered in 90 days?"
The function below pulls customer profiles along with their order totals, so you can immediately segment by value.
def pull_customers(client, days_back=90):
"""Pull customer profiles with order history summary."""
since = (datetime.now() - timedelta(days=days_back)).isoformat()
raw_customers = client.get("customers", params={
"created_at_min": since,
"limit": 250,
})
records = []
for cust in raw_customers:
records.append({
"customer_id": cust["id"],
"email": cust.get("email"),
"first_name": cust.get("first_name", ""),
"last_name": cust.get("last_name", ""),
"created_at": cust["created_at"],
"orders_count": cust.get("orders_count", 0), # Lifetime order count
"total_spent": float(cust.get("total_spent", 0)), # Lifetime spend
"currency": cust.get("currency", "GBP"),
"tags": cust.get("tags", ""), # Custom tags you've assigned in Shopify
"accepts_marketing": cust.get("accepts_marketing", False),
"city": cust.get("default_address", {}).get("city", ""),
"country": cust.get("default_address", {}).get("country", ""),
})
df = pd.DataFrame(records)
df["created_at"] = pd.to_datetime(df["created_at"], utc=True)
# Calculate average order value (total spent divided by number of orders)
df["avg_order_value"] = df.apply(
lambda r: r["total_spent"] / r["orders_count"] if r["orders_count"] > 0 else 0,
axis=1,
)
print(f"Pulled {len(df)} customers from last {days_back} days")
return df
# GraphQL Bulk Operations for Large Stores
The REST API works well for stores with under 10,000 orders per month. But if you're pulling hundreds of thousands of orders, it becomes painfully slow — each page takes a separate API call, and you're limited to 2 calls per second.
GraphQL bulk operations solve this by running your query in the background on Shopify's servers. Instead of pulling data page by page, you submit a query, wait a few minutes, and download a single file containing everything. It's like the difference between photocopying a book page by page versus having someone print you a fresh copy.
# Running a Bulk Export
The code below starts a bulk export, polls Shopify every 5 seconds to check if it's ready, and then downloads the results. The output is a JSONL file (one JSON object per line) which you can parse into a DataFrame.
import json
BULK_ORDERS_QUERY = """
mutation {
bulkOperationRunQuery(
query: \"\"\"
{
orders(query: "created_at:>2026-04-01") {
edges {
node {
id
name
createdAt
totalPriceSet { shopMoney { amount currencyCode } }
subtotalPriceSet { shopMoney { amount } }
totalDiscountsSet { shopMoney { amount } }
displayFinancialStatus
displayFulfillmentStatus
lineItems(first: 50) {
edges {
node {
title
quantity
originalUnitPriceSet { shopMoney { amount } }
sku
}
}
}
}
}
}
}
\"\"\"
) {
bulkOperation {
id
status
}
userErrors {
field
message
}
}
}
"""
def run_bulk_export(shop_domain, access_token, api_version="2026-01"):
"""Start a bulk operation and poll until complete."""
url = f"https://{shop_domain}/admin/api/{api_version}/graphql.json"
headers = {
"X-Shopify-Access-Token": access_token,
"Content-Type": "application/json",
}
# Submit the bulk query to Shopify's background processing
response = requests.post(url, json={"query": BULK_ORDERS_QUERY}, headers=headers, timeout=30)
response.raise_for_status()
result = response.json()
# Check for validation errors in the query
errors = result.get("data", {}).get("bulkOperationRunQuery", {}).get("userErrors", [])
if errors:
raise Exception(f"Bulk operation errors: {errors}")
op_id = result["data"]["bulkOperationRunQuery"]["bulkOperation"]["id"]
print(f"Bulk operation started: {op_id}")
# Poll every 5 seconds until Shopify finishes processing
poll_query = """
query {
currentBulkOperation {
id
status
errorCode
objectCount
url
}
}
"""
while True:
time.sleep(5) # Wait between status checks
poll_response = requests.post(url, json={"query": poll_query}, headers=headers, timeout=30)
poll_response.raise_for_status()
op = poll_response.json()["data"]["currentBulkOperation"]
if op["status"] == "COMPLETED":
print(f"Complete: {op['objectCount']} objects exported")
return op["url"] # Download URL for the results file
elif op["status"] in ("FAILED", "CANCELED"):
raise Exception(f"Bulk operation {op['status']}: {op.get('errorCode')}")
print(f" Status: {op['status']} ({op.get('objectCount', 0)} objects)")
def download_bulk_results(download_url):
"""Download and parse JSONL bulk operation results."""
response = requests.get(download_url, timeout=60)
response.raise_for_status()
# Each line is a separate JSON object (one per order/line item)
records = []
for line in response.text.strip().split("\n"):
records.append(json.loads(line))
print(f"Downloaded {len(records)} records from bulk export")
return records
# When to Use Bulk Operations
Here's a quick decision guide based on your store size:
| Scenario | REST API | GraphQL Bulk |
|---|---|---|
| < 1,000 orders/month | ✓ | |
| 1,000–10,000 orders/month | ✓ (slower) | ✓ |
| > 10,000 orders/month | Too slow | ✓ |
| Real-time single-order lookup | ✓ | |
| Full catalogue export | Slow | ✓ |
| Incremental daily pulls | ✓ |
# Rate Limits and Best Practices
Shopify limits how fast you can make API calls to protect their servers (and all the other stores sharing them). If you exceed the limit, Shopify temporarily blocks your requests with a 429 "Too Many Requests" error. Understanding these limits prevents your scripts from randomly failing halfway through.
# REST API Limits
Shopify uses a leaky bucket algorithm — imagine a bucket that holds 40 marbles and has a hole that lets 2 marbles out per second. You can throw marbles in as fast as you want (up to 40 at once), but once the bucket is full, you have to wait for marbles to leak out before adding more.
- Bucket size: 40 calls
- Leak rate: 2 calls/second
- Burst: Up to 40 calls without waiting
def safe_batch_fetch(client, endpoint, ids, batch_size=50):
"""Fetch resources in batches to stay within rate limits."""
all_results = []
# Process IDs in chunks to avoid overloading the API
for i in range(0, len(ids), batch_size):
batch = ids[i:i + batch_size]
results = client.get(endpoint, params={
"ids": ",".join(str(id) for id in batch),
"limit": batch_size,
})
all_results.extend(results)
# Pause if we're running low on API calls
if client._calls_remaining < 10:
time.sleep(2)
return all_results
# Common Pitfalls
These are mistakes that trip up almost everyone building Shopify reporting for the first time:
Not using
processed_atfor revenue timing —created_atis when the customer clicked "Buy";processed_atis when payment was actually captured. For financial reporting, you almost always wantprocessed_at— it's when the money actually arrived.Ignoring test orders — Filter by
test: falsein production reporting.Forgetting timezone conversion — Shopify returns all timestamps in UTC (London winter time). If your store is in New York, an order placed at 11pm local time shows up as the next day in UTC. Always convert to your store's timezone before grouping by date.
Not accessing older orders — Only the last 60 days of orders are accessible by default. You need the
read_all_ordersscope (request it during app approval) to access the full history.status: anyincludes canceled/closed orders but does not bypass the 60-day limit.Exceeding API version lifecycle — Shopify releases new API versions quarterly and deprecates old ones after roughly 12 months. Pin to a version (e.g.
2026-01) and update during maintenance windows. Check the developer changelog for breaking changes.
# Putting It Together: Automated Data Pipeline
Here's the complete pipeline that ties everything together. This single script pulls all four data types, saves them as Parquet files (a compact, fast file format that's better than CSV for data analysis), and checks for low stock alerts. Run this daily with a scheduler like cron or Prefect, and you'll never manually export from Shopify again.
import os
from pathlib import Path
def run_daily_data_pull():
"""Complete daily data extraction pipeline."""
# Load credentials from environment variables (never hard-code secrets)
client = ShopifyAPI(
shop_domain=os.environ["SHOPIFY_STORE"],
access_token=os.environ["SHOPIFY_API_TOKEN"],
)
# Pull yesterday's data for each resource type
orders = pull_orders(client, days_back=1)
line_items = pull_line_items(client, days_back=1)
inventory = pull_inventory(client) # Always pulls current snapshot
customers = pull_customers(client, days_back=1)
# Save each dataset to a date-stamped folder as Parquet files
output_dir = Path("data") / datetime.now().strftime("%Y-%m-%d")
output_dir.mkdir(parents=True, exist_ok=True)
orders.to_parquet(output_dir / "orders.parquet", index=False)
line_items.to_parquet(output_dir / "line_items.parquet", index=False)
inventory.to_parquet(output_dir / "inventory.parquet", index=False)
customers.to_parquet(output_dir / "customers.parquet", index=False)
# Check for products that need restocking
low_stock = identify_low_stock(inventory)
if not low_stock.empty:
print(f"\nWARNING: {len(low_stock)} variants at low stock - send alert")
print(f"\nDaily pull complete: {output_dir}")
return {
"orders": len(orders),
"line_items": len(line_items),
"inventory_variants": len(inventory),
"customers": len(customers),
}
if __name__ == "__main__":
run_daily_data_pull()
Pulled 47 orders from last 1 days
Pulled 89 line items from 47 orders
Pulled inventory for 342 variants across 2 locations
Low stock (5 or fewer): 12 variants
Out of stock: 3 variants
Pulled 18 customers from last 1 days
Daily pull complete: data/2026-05-08
# Next Steps
This covers the data extraction layer — the foundation of any reporting system. Once you have clean, automated data pulls running daily, you've eliminated the manual export step entirely. From here, you can build layers on top:
- Automate the reports — turn raw data into formatted Excel reports or live dashboards that stakeholders can read without touching Python
- Schedule the pipeline — run daily via cron or Prefect so data is always fresh without manual intervention
- Store historically — feed into PostgreSQL or a data warehouse for long-term trend analysis (compare this December to last December)
- Add alerting — trigger Slack or email notifications when KPIs drop below thresholds (revenue down 20%, conversion rate dropped, stock running out)
The key insight is that the API gives you all the data. The limitation was never access — it was automation. With these scripts running on a schedule, your reporting goes from "someone exports a CSV on Monday morning" to "data arrives automatically, reports update themselves, alerts fire before problems become crises."
For the full reporting pipeline that turns this data into automated Shopify reports, see How to Automate Shopify Reports with Python and the Shopify API.
Ecommerce optimization services include building custom Shopify API reporting pipelines tailored to your store's metrics.
Get in touch to discuss automating your Shopify data extraction.
Enjoyed this article?
Get notified when I publish new articles on automation, ecommerce, and data engineering.