Study / Structured Data
Only 4 of 50 retailers returned a readable price.
When we pointed a deliberately simple AI shopping agent at the product pages of 50 of the largest US retailers and asked one question—can you read the price?—it succeeded in finding structured data on only 4 pages. 19 retailers blocked it outright. 17 more had no discoverable product URLs. 10 returned pages with no machine-readable price. The gap between what agents can read and what humans see is wider than the numbers suggest.
The four outcomes
Blocked: 19 of 50. The agent was refused. 17 by HTTP status (403, 429). 2 more (Amazon, Walmart) returned a 200 status but served a bot wall—an "access denied" or captcha interstitial instead of the product page. We detect walls by page content, not status code, so a wall is never counted as a real empty page.
Readable: 4 of 50. The agent loaded a genuine product page and found a machine-readable price. Nike $115. Reebok $85. Crocs $29.99. Allbirds $110.
Unreachable: 17 of 50. We could not obtain a working product URL for these retailers because they also block automated URL discovery (sitemaps, site search). The inability to even find a product page as an agent is part of the finding, not a gap in our method.
No readable data: 10 of 50. The page returned a 200 with no bot wall, but exposed no machine-readable price. This bucket is mixed. At least two are confirmed genuine: Target and Foot Locker have real product pages where the price is rendered client-side only, absent from the server HTML a plain agent receives. The rest are generic, redirected, or mismatched URLs—our own discovery misses, not retailer behavior.
What we cannot claim
This is a test of simple agent readability, not a ceiling on what a browser-based or privately integrated agent can do. A more capable agent may reach pages this one could not.
The 4-of-50 readable figure is a lower bound. Our own URL-discovery failures and the HTTP-only method likely undercount pages that would yield data to a stronger agent.
We do not claim that large retailers generally lack structured data. We measured what a naive agent could read on one day, not site-wide schema adoption.
All figures are a single June 2026 snapshot.
Method
- Sample
- 50 retailers drawn from the NRF Top 100 (US), list frozen before any crawling.
- Snapshot
- June 2026, single pass.
- Agent type
- Naive HTTP fetch. Not a stealth browser, not a headless crawler, not a partner agent program.
- Signal
- A schema.org Product or Offer price in the server-rendered HTML, accepted as a number or numeric string.
- Human check
- Full-page screenshot of the same URL, taken at the same time.
Cite as: CrawlSpace Labs, "Ghost Price: The Price AI Agents Read", https://crawlspacelabs.dev/ghost-price.
Evidence
Per retailer we retain: the screenshot, raw HTML, HTTP status, the
verdict (blocked, readable, no readable data, unreachable), and the
verified date. The evidence is stored alongside the source code in the ghost-price/evidence/ directory. This page
functions as an evidence index rather than a narrative with hidden
working.
Context
The price a customer sees online and the price an AI agent reads are increasingly divorced. Retailers publish structured data but with stale, incorrect, or deliberately misleading values. Agents trying to comparison-shop can end up with wrong data. This pilot measures how often that happens at scale among large retailers.
The agent in this study is a plain HTTP GET of server-rendered HTML—the same thing a simple crawler sees. It is not a secret or stealth agent. It accepts HTTP 403, 429, and 200-status bot walls as refusals. It counts only genuine product pages with parseable prices as readable. It respects the conservative rule: every page has been hand-verified against the live retailer URL before publication.
Status
This is a pilot. Phase 1 is complete with results as published here. Phase 2 will scale to a larger pre-registered sample only after the method holds and the accuracy-first brand is secure.
Last verified: 2026-06-20.