How to Block AI Bots with robots.txt in 2026

SEO advice should tell you what to change, why it matters, and how to check the result. This guide shows you how to block AI crawlers like GPTBot, ClaudeBot, and Google AI from scraping your website using robots.txt directives. Use the linked SEOKit tools when you want to test the implementation instead of guessing.

Why Block AI Bots?

AI companies train their large language models by crawling the web and ingesting website content. In 2026, this practice has become a significant concern for publishers, creators, and businesses who do not want their content used for AI training without consent or compensation.

The major AI crawlers include:

GPTBot — OpenAI’s crawler for training data
ChatGPT-User — OpenAI’s crawler for real-time browsing features
ClaudeBot — Anthropic’s crawler
Google-Extended — Google’s crawler for Gemini AI training
Bytespider — ByteDance’s crawler for training purposes
CCBot — Common Crawl’s bot, used by many AI training datasets
Meta-ExternalAgent — Meta’s AI training crawler

Blocking these bots does not affect your search engine rankings. Google has explicitly stated that blocking Google-Extended has no impact on your appearance in Google Search results. The regular Googlebot, which indexes your site for search, is a separate user agent.

Using robots.txt to Block AI Crawlers

The robots.txt file lives at the root of your website (e.g., https://example.com/robots.txt) and tells web crawlers which parts of your site they may access. While compliance is voluntary (bots can choose to ignore it), all major AI companies have committed to respecting robots.txt directives.

Basic Syntax

Each rule in robots.txt has two parts: a User-agent directive specifying which bot the rule applies to, and a Disallow directive specifying what is off-limits.

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

The above blocks GPTBot, ClaudeBot, and Google-Extended from your entire site. The / means everything.

Block All Known AI Bots

Here is a comprehensive block list for 2026:

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

Allow Search Engines, Block AI

Make sure you are not accidentally blocking search engine crawlers. Your robots.txt should still allow Googlebot, Bingbot, and other search engine bots:

# Allow search engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# Block AI crawlers
User-agent: GPTBot
Disallow: /

If you have a blanket User-agent: * / Disallow: / rule, it will block everything, including search engines. Be specific about which bots you want to block.

Using the seokit Robots.txt Generator

Building a robots.txt file manually is error-prone. A misplaced directive can accidentally block search engines or leave AI bots unblocked.

The seokit Robots.txt Generator provides a visual interface with one-click AI bot blocking. Select the bots you want to block from a checklist, configure your sitemap URL, and download a valid robots.txt file. The tool validates your configuration and warns about potential issues.

Partial Blocking

You may want AI bots to access some pages but not others. For example, you might allow your homepage and product pages while blocking your blog content:

User-agent: GPTBot
Disallow: /blog/
Disallow: /articles/
Allow: /

This tells GPTBot it can access everything except the /blog/ and /articles/ directories.

Limitations of robots.txt

It Is Advisory, Not Enforced

robots.txt is a protocol based on good faith. Any bot can technically ignore it. However, all major AI companies face legal and reputational consequences for violating robots.txt directives, so compliance is high among legitimate crawlers.

It Does Not Block Scraping

robots.txt prevents well-behaved bots from crawling. It does not prevent someone from manually copying your content or using tools that ignore robots.txt. For stronger protection, consider additional measures like rate limiting, authentication, or legal notices.

It Is Not Retroactive

Blocking GPTBot today does not remove content that was already crawled and ingested into a training dataset. It only prevents future crawling.

Additional Protection Measures

Meta Tags

The noai and noimageai meta tags provide page-level control:

<meta name="robots" content="noai, noimageai">

Use the seokit Meta Tag Generator to build these tags correctly.

HTTP Headers

Some sites use the X-Robots-Tag HTTP header for resources like PDFs and images that cannot contain meta tags.

Legal Notices

Add a clear statement to your terms of service prohibiting use of your content for AI training. This creates a legal basis for enforcement.

Generate Your robots.txt Now

Protecting your content from AI crawlers takes less than a minute. Use the seokit Robots.txt Generator to build a valid robots.txt file with AI bot blocking, sitemap references, and proper search engine access. Download it and upload it to your site’s root directory.