How AI agents discover your website

Last updated: December 16, 2025

This guide covers the fundamentals of making your website discoverable by AI agents like ClaudeBot, GPTBot, Perplexity, and other AI crawlers. These steps work for any website, regardless of your tech stack or hosting platform.


Step 1: Allow AI crawlers in robots.txt

Update your /robots.txt file to explicitly allow AI agent crawlers. While most default to allowing all agents, being explicit ensures compatibility.

# AI Agent Crawlers (2025)
User-agent: ClaudeBot
Allow: /

User-agent: GPTBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Anthropic-AI
Allow: /

User-agent: GoogleBot-Extended
Allow: /

User-agent: ChatGPT-User
Allow: /

# Standard search crawlers
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /

# Link to your sitemaps
Sitemap: https://yourdomain.com/sitemap.xml
Sitemap: https://yourdomain.com/sitemap-pages.xml
💡 Tip: If you want to block AI training bots while allowing search/discovery bots, you can selectively disallow certain agents. However, most modern AI crawlers respect robots.txt and are safe to allow.

Known AI Crawlers (December 2025)

Bot NameCompanyPurpose
ClaudeBotAnthropicDiscovery, indexing, training
GPTBotOpenAITraining, search integration
ChatGPT-UserOpenAIUser-initiated searches
PerplexityBotPerplexity AISearch indexing
GoogleBot-ExtendedGoogleAI training (Gemini, Bard)
Anthropic-AIAnthropicResearch crawling

Step 2: Create and submit XML sitemaps

Sitemaps are the primary way AI agents discover your content structure. Create a comprehensive sitemap that includes all important pages.

Basic sitemap structure:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://yourdomain.com/</loc>
    <lastmod>2025-12-16</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://yourdomain.com/about</loc>
    <lastmod>2025-12-10</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
  <!-- Add more URLs -->
</urlset>

Submit to search engines:

  1. Google Search Console: Add https://yourdomain.com/sitemap.xml
  2. Bing Webmaster Tools: Add https://yourdomain.com/sitemap.xml
💡 Tip: For large sites, split into multiple sitemaps (sitemap-pages.xml, sitemap-blog.xml, etc.) and create a sitemap index that references them all.

Step 3: Create agent discovery manifest

The /.well-known/agent-discovery.json file provides AI agents with structured guidance about your site's capabilities and important endpoints.

See full examples and implementation details on the page...


Verification Checklist


Resources


Key Terms

Agent-First SEO Agent Engine Optimization (AEO) AI Agent Discovery Agent Discovery Manifest ClaudeBot GPTBot AI Agent Crawling Structured Data JSON-LD Semantic HTML


Last updated: December 16, 2025 | Open-source guide for AI agent optimization fundamentals