Price APIs use headless browsers (Playwright/Puppeteer) to render product pages, CSS selectors to locate price elements, and parsing logic to convert raw text into structured JSON. The hard part is handling anti-bot measures, edge cases, and ongoing selector maintenance.
When you call a price API, a multi-step pipeline executes behind the scenes:
1. URL validation: the API checks that your URL belongs to a supported retailer domain and is a valid product page URL (not a search results page, category page, or homepage).
2. Browser rendering: a headless browser (typically Chromium via Playwright or Puppeteer) navigates to the product URL. The browser executes JavaScript, loads dynamic content, and renders the page as a real browser would. This is necessary because most e-commerce sites load prices dynamically via JavaScript — a simple HTTP GET request would return an empty price placeholder.
3. Resource blocking: to speed up page loads, the browser blocks unnecessary resources like images, fonts, CSS stylesheets, and tracking scripts. Only the HTML structure and JavaScript needed to render pricing data are loaded.
4. Element selection: CSS selectors target the price element on the page. Each retailer has different HTML structure, so each requires retailer-specific selectors. Amazon's price might be in a `.a-price .a-offscreen` element while Walmart's is in a `[data-testid="price"]` element.
5. Parsing: the raw text content (e.g., "$29.99" or "EUR 24,99") is converted to a numeric value and currency code. This handles locale-specific formatting — commas as decimal separators in European pricing, yen with no decimal places, etc.
6. Response: the structured data is returned as JSON with price, currency, stock status, and metadata.
Modern e-commerce sites are JavaScript applications. When you load an Amazon product page, the initial HTML contains layout scaffolding but no price data. The price is injected by JavaScript after the page loads — often through API calls that the browser makes to Amazon's internal services.
A simple HTTP request with a library like `requests` or `fetch` only gets the initial HTML. No JavaScript executes, so no price appears. Headless browsers solve this by running a full Chromium instance that executes JavaScript, waits for dynamic content to render, and then gives you the fully-rendered DOM.
The trade-off is resource usage. Each headless browser instance consumes 200-500MB of RAM. Running multiple concurrent instances requires significant server resources. This is one reason price APIs are not free — the infrastructure costs are real.
Some simpler retail sites do include prices in their initial HTML (server-side rendered). For those, a direct HTTP request suffices and is much faster. Price APIs typically use the fastest approach available for each retailer.
Finding the price on a product page requires knowing where to look. Each retailer uses different HTML structure, class names, and data attributes. A scraper for Amazon needs different selectors than a scraper for Walmart.
Selectors are the most fragile part of the pipeline. When a retailer redesigns their product page or changes their CSS class names, selectors break. Amazon changes their price element structure multiple times per year. A professional price API has automated monitoring that detects selector failures quickly and engineering resources to update them.
Once the raw price text is extracted, it needs to be parsed into a number. This is harder than it sounds:
- "$29.99" is 29.99 USD - "29,99 EUR" is 29.99 EUR (comma as decimal separator) - "2,499" is 2499.00 (comma as thousands separator in US format) - "2.499,00" is 2499.00 (European format with dot as thousands separator) - "\u00a529,800" is 29800 JPY (yen, no decimal places)
A good price parser handles all of these formats by using the retailer's locale and currency conventions rather than making assumptions about number formatting.
# Simplified price parsing logic
import re
from decimal import Decimal
def parse_price(text: str, locale: str = "en-US") -> Decimal:
# Remove currency symbols and whitespace
cleaned = re.sub(r"[^\d.,]", "", text.strip())
if locale in ("de-DE", "fr-FR", "it-IT"):
# European: 1.234,56 -> 1234.56
cleaned = cleaned.replace(".", "").replace(",", ".")
else:
# US/UK: 1,234.56 -> 1234.56
cleaned = cleaned.replace(",", "")
return Decimal(cleaned)Retailers do not want bots visiting their pages. They deploy several countermeasures:
IP rate limiting: too many requests from the same IP address trigger blocks. Price APIs use pools of rotating proxy IPs to distribute requests across many addresses.
CAPTCHAs: retailers serve CAPTCHAs when they suspect automated access. Some APIs integrate CAPTCHA solving services. Others rely on proxy quality and browser fingerprinting to avoid triggering CAPTCHAs in the first place.
Browser fingerprinting: sites check browser properties like screen size, installed fonts, WebGL rendering, and plugin lists to identify headless browsers. Modern scraping tools randomize these properties to appear as regular browsers.
JavaScript challenges: some sites run JavaScript that tests for browser automation (checking for Playwright/Puppeteer-specific properties). Stealth plugins patch these detectable properties.
This is an arms race. Retailers improve their detection, scraping tools improve their evasion. Price APIs invest continuously in staying ahead of detection methods — that ongoing investment is part of what you pay for.
A production price API processes thousands of concurrent requests. The architecture typically includes:
A browser pool: pre-started Chromium instances that are reused across requests. Starting a new browser per request is too slow. The pool manages allocation, recycling, and crash recovery.
A queue system: incoming API requests are queued when all browser instances are busy. This prevents overloading the server and provides backpressure to callers via rate limiting.
Retailer-specific scrapers: each retailer has its own scraper module with custom selectors, parsing logic, and edge case handling. A URL router maps the incoming URL to the correct scraper.
Monitoring and alerting: automated checks detect when a scraper's success rate drops (indicating broken selectors), when response times spike, or when a retailer starts blocking requests more aggressively.
The entire pipeline from API call to JSON response typically takes 3-8 seconds, with most of that time spent waiting for the browser to render the product page.
Sign up in 30 seconds. No credit card required. One credit per successful API call.