Build a Price Monitoring Bot with Python and Playwright
Build a price monitoring bot that tracks competitor prices on any website — using Playwright for JavaScript-rendered pages, structured extraction with fallback selectors, change detection, and scheduled alerts when prices move.
AI Generated ImageYou check a competitor's price by hand. You open the page, scan for the number, maybe write it down. Then you forget to check for two weeks and miss the price drop that undercut you.
Ecommerce pricing is not something you should do manually. Competitor prices change constantly — flash sales, seasonal adjustments, inventory clearances. If you are not tracking them automatically, you are reacting instead of competing. By the time you notice a change, your customers have already noticed it too.
This guide builds a price monitoring bot in Python that scrapes competitor prices from any website, handles JavaScript-rendered pages, stores price history, and sends you an alert the moment a price changes by more than a threshold you set.
# Who This Is For
- Ecommerce store owners who want to track competitor prices without paying for expensive monitoring tools
- Data engineers building price intelligence systems for marketing or merchandising teams
- Developers who need to scrape dynamic websites that do not work with simple HTTP requests
- Anyone who checks competitor websites manually and wants to stop
Basic Python is all you need. The guide covers Playwright setup, CSS selectors, and browser automation from scratch.
# Bot Architecture
flowchart LR CFG["Config\n(URLs + selectors)"] --> SCH["Scheduler\n(cron / interval)"] SCH --> PW["Playwright\n(headless browser)"] PW --> EXT["Extract\n(price + metadata)"] EXT --> VAL["Validate\n(parse + clean)"] VAL --> DB["Store\n(SQLite history)"] DB --> CHK["Change\nDetection"] CHK -->|Threshold crossed| ALT["Alert\n(email / Slack)"] CHK -->|No change| LOG["Log\n(price stable)"]
Playwright handles the browser rendering. The extraction layer uses multiple selector strategies so a single CSS class change does not break everything. Change detection compares the current price against the last known value and fires alerts when the delta exceeds your threshold.
# What You Will Need
pip install playwright httpx sqlite-utils schedule
playwright install chromium
- playwright — browser automation (renders JavaScript, handles SPAs)
- httpx — lightweight HTTP for robots.txt checks
- sqlite-utils — simple SQLite wrapper for price history
- schedule — cron-like scheduling in Python
The playwright install chromium command downloads a Chromium binary. This is about 150 MB.
# Step 1: Page Configuration
Define what to scrape and where to find the price on each page.
from dataclasses import dataclass, field
@dataclass
class ProductConfig:
"""Configuration for monitoring a single product page."""
name: str
url: str
# multiple selectors in priority order — if the first breaks, try the next
price_selectors: list[str]
currency: str = "GBP"
# optional: grab product name from the page to detect if the URL changed
title_selector: str | None = None
def __post_init__(self):
if not self.price_selectors:
raise ValueError(f"No price selectors for {self.name}")
# example configs — adjust selectors for your target sites
PRODUCTS = [
ProductConfig(
name="Competitor A - Widget Pro",
url="https://competitor-a.com/products/widget-pro",
price_selectors=[
"[data-testid='price']", # most stable: data attributes
".product-price .current-price", # class-based fallback
"span.price", # broad fallback
],
title_selector="h1.product-title",
),
ProductConfig(
name="Competitor B - Widget Pro",
url="https://competitor-b.com/widget-pro",
price_selectors=[
".price-box .special-price",
".product-info-price .price",
"[itemprop='price']", # schema.org microdata
],
title_selector="h1",
),
]
Three selectors per product in priority order. Data attributes like data-testid survive redesigns better than CSS classes. Schema.org microdata like [itemprop='price'] is even more stable — sites rarely remove structured data because it affects their search rankings.
# Step 2: Price Extraction with Playwright
import re
import logging
from playwright.sync_api import sync_playwright, TimeoutError as PwTimeout
logger = logging.getLogger("price_bot")
def extract_price(page, selectors: list[str]) -> float | None:
"""Try each selector until one returns a valid price.
Returns the first successfully parsed price, or None if all fail.
"""
for selector in selectors:
try:
el = page.wait_for_selector(selector, timeout=5000)
if not el:
continue
raw_text = el.text_content().strip()
# strip currency symbols and thousands separators, keep decimal
cleaned = re.sub(r"[^\d.]", "", raw_text)
if not cleaned:
continue
price = float(cleaned)
if price <= 0:
continue
logger.debug(f"Got price {price} from selector: {selector}")
return price
except PwTimeout:
logger.debug(f"Selector timed out: {selector}")
continue
except (ValueError, AttributeError) as e:
logger.debug(f"Parse error with selector {selector}: {e}")
continue
return None
def scrape_product(product: ProductConfig) -> dict | None:
"""Launch a browser, navigate to the product page, extract the price.
Returns a dict with name, url, price, and title, or None on failure.
"""
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context(
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/125.0.0.0 Safari/537.36"
),
viewport={"width": 1280, "height": 720},
)
page = context.new_page()
try:
page.goto(product.url, wait_until="networkidle", timeout=30000)
price = extract_price(page, product.price_selectors)
if price is None:
logger.warning(f"Could not extract price for {product.name}")
return None
title = None
if product.title_selector:
try:
title_el = page.query_selector(product.title_selector)
title = title_el.text_content().strip() if title_el else None
except Exception:
pass
return {
"name": product.name,
"url": product.url,
"price": price,
"currency": product.currency,
"title": title,
}
except Exception as e:
logger.error(f"Failed to scrape {product.name}: {e}")
return None
finally:
browser.close()
Notice the wait_until="networkidle" — this waits for the page to stop making network requests, which is usually when JavaScript-rendered prices are available. The timeout is 30 seconds because some ecommerce sites are slow.
The user agent is set to a real Chrome string. Some sites serve different content (or block) requests from default Playwright user agents.
# Step 3: Price History Storage
SQLite is perfect for this. No server to manage, the database is a single file, and you can query historical prices with SQL.
import sqlite_utils
from datetime import datetime, timezone
class PriceStore:
"""SQLite-backed price history with change detection."""
def __init__(self, db_path: str = "prices.db"):
self.db = sqlite_utils.Database(db_path)
self._ensure_tables()
def _ensure_tables(self):
if "prices" not in self.db.table_names():
self.db["prices"].create({
"id": int,
"product_name": str,
"url": str,
"price": float,
"currency": str,
"title": str,
"scraped_at": str,
}, pk="id")
self.db["prices"].create_index(["product_name", "scraped_at"])
def record(self, product_name: str, url: str, price: float,
currency: str, title: str | None = None):
"""Store a price observation."""
self.db["prices"].insert({
"product_name": product_name,
"url": url,
"price": price,
"currency": currency,
"title": title or "",
"scraped_at": datetime.now(timezone.utc).isoformat(),
})
def last_price(self, product_name: str) -> float | None:
"""Get the most recent price for a product."""
rows = list(self.db.execute(
"SELECT price FROM prices WHERE product_name = ? "
"ORDER BY scraped_at DESC LIMIT 1",
[product_name],
).fetchall())
return rows[0][0] if rows else None
def price_history(self, product_name: str, days: int = 30) -> list[dict]:
"""Get price history for the last N days."""
cutoff = datetime.now(timezone.utc).isoformat()
rows = self.db.execute(
"SELECT price, scraped_at FROM prices "
"WHERE product_name = ? AND scraped_at > datetime(?, '-' || ? || ' days') "
"ORDER BY scraped_at ASC",
[product_name, cutoff, days],
).fetchall()
return [{"price": r[0], "date": r[1]} for r in rows]
The last_price method is key for change detection. Before storing a new price, you compare it to the last one. The index on (product_name, scraped_at) keeps lookups fast even with months of history.
# Step 4: Change Detection and Alerts
import smtplib
from email.mime.text import MIMEText
import os
class PriceAlert:
"""Detect price changes and send alerts."""
def __init__(self, threshold_pct: float = 5.0):
self.threshold = threshold_pct / 100.0
def check(self, product_name: str, new_price: float,
old_price: float | None) -> dict | None:
"""Compare prices and return alert info if threshold is exceeded."""
if old_price is None:
# first observation, no comparison possible
return None
if old_price == 0:
return None
change_pct = (new_price - old_price) / old_price
abs_change = abs(change_pct)
if abs_change < self.threshold:
return None
direction = "dropped" if change_pct < 0 else "increased"
return {
"product": product_name,
"old_price": old_price,
"new_price": new_price,
"change_pct": round(change_pct * 100, 1),
"direction": direction,
}
def send_alert_email(alerts: list[dict]):
"""Send a price change summary via email.
Requires SMTP_HOST, SMTP_USER, SMTP_PASS, ALERT_EMAIL env vars.
"""
if not alerts:
return
body_lines = ["Price changes detected:\n"]
for a in alerts:
body_lines.append(
f" {a['product']}: {a['old_price']:.2f} -> {a['new_price']:.2f} "
f"({a['change_pct']:+.1f}% {a['direction']})"
)
body = "\n".join(body_lines)
logger.info(body)
smtp_host = os.environ.get("SMTP_HOST")
if not smtp_host:
logger.warning("SMTP_HOST not set — skipping email, logged above")
return
msg = MIMEText(body)
msg["Subject"] = f"Price Alert: {len(alerts)} product(s) changed"
msg["From"] = os.environ["SMTP_USER"]
msg["To"] = os.environ["ALERT_EMAIL"]
with smtplib.SMTP(smtp_host, 587) as server:
server.starttls()
server.login(os.environ["SMTP_USER"], os.environ["SMTP_PASS"])
server.send_message(msg)
logger.info(f"Alert email sent for {len(alerts)} price changes")
The threshold is configurable. 5% is a reasonable default — you do not want alerts for a $0.10 fluctuation on a $200 product. For high-value items, you might drop it to 2%. For commodity products where margins are thin, 1%.
# Step 5: The Monitoring Loop
import time
import random
import schedule
def run_check():
"""Run a single price check cycle across all configured products."""
store = PriceStore()
alerter = PriceAlert(threshold_pct=5.0)
alerts = []
for product in PRODUCTS:
# polite delay between requests — 3-7 seconds
delay = random.uniform(3.0, 7.0)
logger.info(f"Checking {product.name} (waiting {delay:.1f}s)")
time.sleep(delay)
result = scrape_product(product)
if result is None:
logger.warning(f"Skipping {product.name} — scrape failed")
continue
old_price = store.last_price(product.name)
store.record(
product_name=result["name"],
url=result["url"],
price=result["price"],
currency=result["currency"],
title=result.get("title"),
)
alert = alerter.check(product.name, result["price"], old_price)
if alert:
alerts.append(alert)
logger.info(
f"PRICE CHANGE: {product.name} "
f"{alert['old_price']:.2f} -> {alert['new_price']:.2f} "
f"({alert['change_pct']:+.1f}%)"
)
else:
logger.info(f"{product.name}: {result['price']:.2f} (no change)")
if alerts:
send_alert_email(alerts)
logger.info(f"Check complete: {len(PRODUCTS)} products, {len(alerts)} alerts")
def main():
"""Run the price monitoring bot on a schedule."""
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(name)s] %(levelname)s %(message)s",
)
logger.info(f"Starting price monitor for {len(PRODUCTS)} products")
# run once immediately, then on schedule
run_check()
# check every 4 hours during business hours
schedule.every(4).hours.do(run_check)
while True:
schedule.run_pending()
time.sleep(60)
if __name__ == "__main__":
main()
The random delay between 3 and 7 seconds is important. Fixed delays are a bot fingerprint — real users do not click at exact intervals. The schedule runs every 4 hours, which is frequent enough for price intelligence and infrequent enough to avoid getting blocked.
# Step 6: Analysing Price History
Once you have a few weeks of data, you can answer useful questions.
import pandas as pd
def price_report(db_path: str = "prices.db") -> pd.DataFrame:
"""Generate a summary report of price movements."""
db = sqlite_utils.Database(db_path)
df = pd.DataFrame(db.execute("""
SELECT
product_name,
MIN(price) as min_price,
MAX(price) as max_price,
AVG(price) as avg_price,
COUNT(*) as observations,
MIN(scraped_at) as first_seen,
MAX(scraped_at) as last_seen
FROM prices
GROUP BY product_name
""").fetchall(), columns=[
"product", "min", "max", "avg", "obs", "first_seen", "last_seen"
])
df["range_pct"] = ((df["max"] - df["min"]) / df["avg"] * 100).round(1)
return df
def find_price_drops(db_path: str = "prices.db",
min_drop_pct: float = 10.0) -> list[dict]:
"""Find significant price drops in the history.
Useful for spotting competitor clearance sales or loss leaders.
"""
db = sqlite_utils.Database(db_path)
results = []
for product_name in db.execute(
"SELECT DISTINCT product_name FROM prices"
).fetchall():
name = product_name[0]
prices = db.execute(
"SELECT price, scraped_at FROM prices "
"WHERE product_name = ? ORDER BY scraped_at ASC",
[name],
).fetchall()
for i in range(1, len(prices)):
prev_price = prices[i - 1][0]
curr_price = prices[i][0]
if prev_price > 0:
change = (curr_price - prev_price) / prev_price * 100
if change <= -min_drop_pct:
results.append({
"product": name,
"from_price": prev_price,
"to_price": curr_price,
"drop_pct": round(change, 1),
"date": prices[i][1],
})
return results
The range_pct column shows how volatile each competitor's pricing is. A product with 30% range is being actively managed — watch it closely. A product with 2% range has stable pricing and probably is not worth checking every 4 hours.
# Step 7: Hardening for Production
# Handle Stale Selectors
def selector_health_check(products: list[ProductConfig]):
"""Check if selectors are still working.
Run this weekly. If a selector starts failing, you know the site
was redesigned before you lose days of data.
"""
broken = []
for product in products:
result = scrape_product(product)
if result is None:
broken.append(product.name)
logger.error(f"All selectors broken for {product.name}")
elif result.get("title") is None and product.title_selector:
logger.warning(f"Title selector broken for {product.name}")
if broken:
send_alert_email([{
"product": name,
"old_price": 0,
"new_price": 0,
"change_pct": 0,
"direction": f"SELECTOR BROKEN — needs manual fix",
} for name in broken])
return broken
# Run with System Crontab
For production, use crontab instead of the Python scheduler:
# check prices every 4 hours
0 */4 * * * cd /srv/apps/price-bot && /usr/bin/python3 monitor.py --once >> /var/log/price-bot.log 2>&1
Add a --once flag to your script:
import sys
if __name__ == "__main__":
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(name)s] %(levelname)s %(message)s",
)
if "--once" in sys.argv:
run_check()
else:
main() # run the schedule loop
# What This Replaces
| Manual process | Bot equivalent |
|---|---|
| Checking competitor sites by hand | Automated checks every 4 hours |
| Spreadsheet of prices updated weekly | SQLite database with full history |
| Missing price drops because you forgot to check | Instant email alerts on threshold changes |
| Not knowing when competitors run sales | Historical analysis shows pricing patterns |
| Paying $200/month for a SaaS price monitoring tool | Your own bot for the cost of compute |
# Next Steps
For building the web scraping foundations that this bot builds on, see Web Scraping to Structured Data: Building Reliable Extraction Pipelines. For automating your Shopify store reports alongside competitor monitoring, see How to Automate Shopify Reports Using the Python API. For building alerting systems that actually get read, see Build a Notification System That Actually Gets Read. For securing the credentials this bot uses, see Python Secrets Management for Automation Pipelines.
Ecommerce optimisation services include building competitor price monitoring systems and automated pricing intelligence.
Get in touch to discuss setting up price monitoring for your ecommerce store.
Frequently Asked Questions
- Why use Playwright instead of BeautifulSoup or requests?
- Many ecommerce sites render prices with JavaScript. Requests and BeautifulSoup only see the raw HTML before JavaScript runs, so the price elements are empty. Playwright runs a real browser that executes JavaScript, waits for the page to load, and then extracts the rendered content.
- Will I get blocked by websites?
- Possibly, if you scrape too aggressively. The bot in this guide uses respectful delays between requests, rotates user agents, checks robots.txt, and runs on realistic intervals (hourly, not every second). For most competitor monitoring use cases, checking prices a few times per day is enough and unlikely to trigger blocks.
- Can this monitor prices on Amazon or Shopify stores?
- Yes. The bot uses CSS selectors that you configure per site, so it works on any website. The guide includes examples for common ecommerce platforms. Amazon has aggressive bot detection, so for Amazon specifically you may want to look at their Product Advertising API instead.
- How do I run this on a schedule?
- The guide includes a cron setup using schedule for simple cases and system crontab for production. You can also wrap it in a Prefect flow if you already use Prefect for orchestration.
Enjoyed this article?
Get notified when I publish new articles on automation, ecommerce, and data engineering.