honors robots.txt; if a site allows both GPTBot and OAI-SearchBot, OpenAI may use one crawl for both purposes
Crawler Verdict
AI crawler compliance registry.
A primary-sourced database of AI crawler tokens, their claimed robots.txt behavior, and documented disputes. Every entry shows what the vendor states, the source of that claim, and any disputed behavior we've observed or been told about. Entries marked verified have been sourced to official documentation; entries marked pending are still being verified against primary sources.
Verified entries (3)
Sourced to official vendor documentation.
OpenAI
honors robots.txt; independently controlled from GPTBot
user-initiated fetch; robots.txt may not apply (OpenAI Dec 2025 crawler-docs revision)
Pending verification (12)
These entries are stubs still being sourced to official documentation.
Anthropic
stated to honor robots.txt; independently controllable
stated to honor robots.txt; independently controllable
stated to honor robots.txt; independently controllable
Perplexity
respects robots.txt
Disputed: undeclared Perplexity crawlers observed circumventing robots.txt
robots.txt-only opt-out token; not a user-agent in logs
Apple
robots.txt-only training opt-out token; does not appear in logs
Common Crawl
Amazon
Meta
vendor-stated to honor robots.txt (vendor-stated-only)
ByteDance
no official documentation
Disputed: reported to ignore robots.txt
About this data
This registry is a primary-sourced database maintained in the crawlspace-labs repository. Every verified entry is backed by a link to official vendor documentation and a date it was last checked. Disputed entries show both the vendor's stated behavior and documented contradictions.
If you spot an error or have a primary source to contribute, reach out. The registry is a living document.