Crawler Verdict

AI crawler compliance registry.

A primary-sourced database of AI crawler tokens, their claimed robots.txt behavior, and documented disputes. Every entry shows what the vendor states, the source of that claim, and any disputed behavior we've observed or been told about. Entries marked verified have been sourced to official documentation; entries marked pending are still being verified against primary sources.

Verified entries (3)

Sourced to official vendor documentation.

OpenAI

GPTBot training

honors robots.txt; if a site allows both GPTBot and OAI-SearchBot, OpenAI may use one crawl for both purposes

robots.txt honored

source https://developers.openai.com/api/docs/bots

as of 2026-06-15

OAI-SearchBot search

honors robots.txt; independently controlled from GPTBot

robots.txt honored

source https://developers.openai.com/api/docs/bots

as of 2026-06-15

ChatGPT-User user fetch

user-initiated fetch; robots.txt may not apply (OpenAI Dec 2025 crawler-docs revision)

robots.txt not honored

source https://developers.openai.com/api/docs/bots

as of 2026-06-15

Pending verification (12)

These entries are stubs still being sourced to official documentation.

Anthropic

ClaudeBot training pending

stated to honor robots.txt; independently controllable

robots.txt stated to honor

Claude-SearchBot search pending

stated to honor robots.txt; independently controllable

robots.txt stated to honor

Claude-User user fetch pending

stated to honor robots.txt; independently controllable

robots.txt stated to honor

Perplexity

PerplexityBot search disputed pending

respects robots.txt

robots.txt stated to honor

Disputed: undeclared Perplexity crawlers observed circumventing robots.txt

Perplexity-User user fetch pending

robots.txt unstated

Google

Google-Extended opt out token pending

robots.txt-only opt-out token; not a user-agent in logs

robots.txt stated to honor

Apple

Applebot search pending

robots.txt stated to honor

Applebot-Extended opt out token pending

robots.txt-only training opt-out token; does not appear in logs

robots.txt stated to honor

Common Crawl

CCBot training pending

robots.txt stated to honor

Amazon

Amazonbot search pending

robots.txt stated to honor

ByteDance

Bytespider training disputed pending

no official documentation

robots.txt unstated

Disputed: reported to ignore robots.txt

About this data

This registry is a primary-sourced database maintained in the crawlspace-labs repository. Every verified entry is backed by a link to official vendor documentation and a date it was last checked. Disputed entries show both the vendor's stated behavior and documented contradictions.

If you spot an error or have a primary source to contribute, reach out. The registry is a living document.

AI crawler compliance registry.

Verified entries (3)

OpenAI

Pending verification (12)

Anthropic

Perplexity

Google

Apple

Common Crawl

Amazon

Meta

ByteDance

About this data