Tool Point

Robots.txt Generator

Create a robots.txt file to instruct web crawlers on how to crawl and index your website.

User Agents

Disallow Paths

Allow Paths

Robots.txt Generator (Create a Robots.txt File)

Generate a properly formatted robots.txt file for your website instantly. Control which pages search engine crawlers can access, add your sitemap location, and configure user-agent specific rules. Free online robots.txt builder with common templates and copy-paste output.

Perfect for managing crawler access, protecting admin areas, optimizing crawl budgets, and guiding search engines to important content.

What This Robots.txt Generator Creates

Our robots.txt generator creates a properly formatted robots.txt file following the Robots Exclusion Protocol (REP) standard defined in RFC 9309. The tool allows you to specify multiple user agents (like Googlebot, Bingbot, or wildcards for all bots), add Allow and Disallow directives to control crawler access to different paths, include Sitemap URLs to help search engines discover your content, and optionally add crawl-delay directives for bots that support them.

Important clarification: A robots.txt file controls crawling (whether search engines can access and download pages), not indexing (whether pages can appear in search results). Pages blocked in robots.txt may still appear in search results without descriptions if they have external links pointing to them. If you need to prevent pages from appearing in search results, use the noindex meta tag or X-Robots-Tag header instead.

Not a security tool: Robots.txt is publicly accessible and provides guidance to well-behaved crawlers, but it's not access control or authentication. It won't prevent malicious bots or humans from accessing pages. Never rely on robots.txt to protect sensitive content. Use proper authentication and security measures for truly private pages.

How to Create a Robots.txt File

Creating your robots.txt file takes just a few steps:

Step 1: Add User Agents

Specify which crawlers your rules apply to. Use "Googlebot" for Google's crawler, "Bingbot" for Bing, or asterisk (*) as a wildcard to apply rules to all crawlers. You can create separate rule sets for different user agents with specific access requirements.

Step 2: Add Allow and Disallow Directives

Specify which paths crawlers should avoid with Disallow directives or explicitly allow with Allow directives. For example, "Disallow: /admin/" prevents crawling of your admin directory, while "Allow: /admin/images/" would explicitly allow a subdirectory within a blocked path. Directives are processed in order of specificity, with longer matching paths taking precedence.

Step 3: Add Your Sitemap URL

Include a "Sitemap:" line with the full URL to your XML sitemap. This helps search engines discover all pages on your site. You can add multiple Sitemap lines if you have several sitemaps (like separate sitemaps for posts, pages, products, etc.).

Step 4: Generate and Upload

Click generate to create your robots.txt file. Copy the output text and save it as "robots.txt" (lowercase, no file extension other than .txt). Upload this file to the root directory of your website so it's accessible at https://yourdomain.com/robots.txt. The file must be at the root level to work properly.

Step 5: Test Your Robots.txt

After uploading, verify your robots.txt file works correctly using Google Search Console's robots.txt tester or the URL Inspection tool. These tools show you whether specific URLs are blocked by your rules and help catch syntax errors or unintended blocks.

Robots.txt Basics: Understanding Each Directive

Understanding how robots.txt works helps you create effective crawl rules:

The Robots Exclusion Protocol (REP) is the standard that defines robots.txt behavior. It's a convention that well-behaved crawlers follow, specifying how crawlers should interpret directives in robots.txt files. The protocol was formalized in RFC 9309 after decades of informal use.

User-agent directive specifies which crawler(s) the following rules apply to. Format: User-agent: name. Common user agents include "Googlebot" (Google's main crawler), "Bingbot" (Bing's crawler), "Slurp" (Yahoo's crawler), and "*" (wildcard matching all crawlers). You can create separate sections with different rules for different user agents. The User-agent line starts a new rule set, with all Disallow and Allow lines below it applying to that user agent until the next User-agent line.

Disallow directive tells crawlers not to access specified paths. Format: Disallow: /path/. For example, "Disallow: /admin/" blocks access to all URLs starting with "/admin/". "Disallow: /" blocks the entire site. "Disallow:" (with nothing after the colon) disallows nothing, effectively allowing everything. The path is case-sensitive and must start with a forward slash. Disallow applies to the specified path and all subpaths unless overridden by more specific rules.

Allow directive explicitly permits crawling of paths that might otherwise be blocked by a Disallow rule. Format: Allow: /path/. Allow is primarily useful for permitting access to specific subdirectories within blocked parent directories. For example, if you block "/admin/" but want to allow "/admin/public/", you can add "Allow: /admin/public/" before the Disallow rule. When both Allow and Disallow rules could apply to a URL, the most specific (longest matching) rule takes precedence.

Sitemap directive specifies the location of your XML sitemap. Format: Sitemap: https://example.com/sitemap.xml. Include the full URL including https://. You can add multiple Sitemap lines if you have multiple sitemaps. The Sitemap directive helps search engines discover your content more efficiently but doesn't override crawl restrictions. Even if a URL is in your sitemap, it will be blocked if Disallow rules prevent access.

Crawl-delay directive attempts to limit crawler request frequency. Format: Crawl-delay: number (number in seconds). However, Google and many major search engines don't support crawl-delay in robots.txt. We'll cover this limitation in detail in the next section.

Scope rules are critical to understand: A robots.txt file only applies to the specific host, protocol, and port where it's located. A robots.txt at https://example.com/robots.txt doesn't control crawling of https://www.example.com (different subdomain), http://example.com (different protocol), or https://example.com:8080 (different port). Each unique host/protocol/port combination needs its own robots.txt file at its root directory.

Common Robots.txt Templates

Here are ready-to-use robots.txt templates for common scenarios:

Allow all crawlers (minimal robots.txt):

User-agent: *
Disallow:

Sitemap: https://example.com/sitemap.xml

This is the most permissive configuration, allowing all crawlers access to all content. Many sites use this simple robots.txt to primarily communicate their sitemap location.

Block all crawlers (staging sites, development):

User-agent: *
Disallow: /

Warning: This blocks all crawling and will cause your entire site to stop being crawled by search engines. Only use this for development, staging, or sites you genuinely want completely excluded from search engines. Never accidentally deploy this to production.

Block admin and private areas:

User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Disallow: /private/
Disallow: /account/
Disallow: /checkout/
Disallow: /cart/

Allow: /wp-admin/admin-ajax.php

Sitemap: https://example.com/sitemap.xml

This blocks common admin areas, account pages, and ecommerce checkout flows while explicitly allowing WordPress admin-ajax.php which needs to be accessible for some site functionality. Adjust paths to match your site structure.

Block internal search and filtered URLs:

User-agent: *
Disallow: /*?s=
Disallow: /*?q=
Disallow: /search/
Disallow: /*?filter=
Disallow: /*?sort=
Disallow: /*?page=

Sitemap: https://example.com/sitemap.xml

This prevents crawling of search results pages and filtered/sorted URLs that often create duplicate content issues. The asterisk with question mark pattern blocks URLs containing those query parameters anywhere in the path.

Allow rendering resources (CSS/JS needed by Google):

User-agent: *
Disallow: /admin/
Disallow: /private/

Allow: /wp-content/themes/
Allow: /wp-content/plugins/
Allow: /assets/
Allow: /static/

Sitemap: https://example.com/sitemap.xml

This example blocks private areas but explicitly allows paths containing CSS, JavaScript, and images that Google needs to render pages properly. Historically, some sites blocked /wp-content/ or /assets/ directories entirely, which prevented Google from rendering pages correctly. Allow these resources while blocking truly private content.

Multiple user agents with different rules:

User-agent: Googlebot
Disallow: /private/

User-agent: Bingbot
Disallow: /private/
Disallow: /heavy-resource-page/

User-agent: *
Disallow: /

Sitemap: https://example.com/sitemap.xml

This creates separate rules for different crawlers. Google and Bing can access most content, but Bing is blocked from a resource-heavy page. All other crawlers are blocked entirely. Each User-agent section operates independently.

Crawl-Delay: When It Matters (And When It Doesn't)

The crawl-delay directive is commonly misunderstood:

Google does not support crawl-delay in robots.txt. Google's crawlers ignore crawl-delay directives entirely. Google manages crawl rate through its own internal algorithms and allows webmasters to adjust crawl rate through Google Search Console settings, not through robots.txt. Adding crawl-delay directives for Googlebot will have no effect.

Some other crawlers do honor crawl-delay. Bing and some other search engines respect crawl-delay, interpreting it as a minimum number of seconds to wait between requests to the same site. For example, "Crawl-delay: 10" would ask supporting crawlers to wait at least 10 seconds between requests.

When to use crawl-delay: If you're experiencing server load issues from specific crawlers (not Googlebot) and want to slow them down, you can add crawl-delay directives to those specific user-agent sections. Check whether the problematic crawler respects crawl-delay before relying on it.

Example with crawl-delay for non-Google bots:

User-agent: Googlebot
Disallow: /private/

User-agent: Bingbot
Crawl-delay: 5
Disallow: /private/

User-agent: *
Crawl-delay: 10
Disallow: /private/

Sitemap: https://example.com/sitemap.xml

This applies no crawl-delay to Google (since Google ignores it anyway), a 5-second delay to Bing, and a 10-second delay to all other crawlers. Remember that this only works for crawlers that respect the directive.

Better crawl rate control: For Google specifically, use Google Search Console to adjust crawl rate rather than trying to control it via robots.txt. For truly aggressive or misbehaving crawlers, use server-level rate limiting or blocking rather than relying on robots.txt directives they may ignore.

Robots.txt vs Noindex: Critical Distinction

This is one of the most common sources of confusion in SEO:

Robots.txt controls crawling, not indexing. When you disallow a URL in robots.txt, you're telling crawlers "don't download this page." However, if the URL has external links pointing to it, search engines may still include it in search results based on anchor text and other signals, just without a description or snippet. The search result might show "A description for this result is not available because of this site's robots.txt."

Noindex controls indexing. To actually prevent a page from appearing in search results, use a noindex meta tag (<meta name="robots" content="noindex">) in the page's HTML head, or use an X-Robots-Tag HTTP header. These directives tell search engines "don't include this page in search results" even if they crawl it.

The catch-22 problem: If you block a page with robots.txt, search engines can't crawl it to see the noindex directive. This means if a page is already indexed and you add it to robots.txt, search engines can't access the page to discover you want it noindexed, potentially keeping it in search results longer. If you want a page removed from search results, ensure it's crawlable but has noindex applied, or use robots.txt but understand the limitations.

Correct approach for removing pages from search results:

  1. Remove robots.txt blocks that prevent crawling the page
  2. Add a noindex meta tag or X-Robots-Tag header to the page
  3. Wait for search engines to recrawl and process the noindex directive
  4. After the page is removed from search results, you can optionally block it with robots.txt if you want to prevent further crawling

When to use robots.txt vs noindex:

  • Use robots.txt to manage server resources and crawl budget by blocking low-value pages you don't want crawlers wasting time on
  • Use robots.txt to prevent duplicate content crawling (like filtered/sorted URLs)
  • Use noindex when you want pages accessible to users but not in search results (like thank you pages, internal search results, user account pages)
  • Use both together carefully: pages you noindex should generally remain crawlable so search engines can see and respect the noindex directive

Troubleshooting Common Robots.txt Issues

Here are solutions to frequent robots.txt problems:

"My whole site disappeared from Google after updating robots.txt" almost always means you accidentally added "Disallow: /" to the wrong user-agent section, blocking all crawling. Check your robots.txt file immediately. Look for "User-agent: *" followed by "Disallow: /" or "User-agent: Googlebot" with "Disallow: /". If you find either, this is blocking your entire site from being crawled. Remove the overly broad Disallow directive, save the corrected file, and upload it to your root directory. Request reindexing through Google Search Console. Full recovery may take days to weeks depending on your site's crawl frequency.

"Robots.txt not working or being ignored" usually indicates the file is in the wrong location or has incorrect formatting. Verify the file is named exactly "robots.txt" (lowercase, no capital letters) and is located at the root directory of your domain: https://yourdomain.com/robots.txt, not in a subdirectory. Check that you're testing on the correct subdomain and protocol (www vs non-www, http vs https). Verify the file is plain text, not a Word document or other format. Use Google Search Console's robots.txt tester to identify syntax errors. Common syntax issues include missing colons after directives, typos in "Disallow" or "User-agent", or incorrect path formatting.

"I want a page removed from Google search results" requires noindex, not robots.txt alone. If you only block a page with robots.txt, it may remain in search results without a description. Correct approach: ensure the page is crawlable (not blocked by robots.txt), add a noindex meta tag to the page, wait for Google to recrawl and process the noindex, verify removal in Google Search Console, then optionally block with robots.txt if desired. For faster removal of specific URLs, use Google Search Console's URL Removal tool for temporary removal while waiting for noindex to take effect.

"Crawl-delay not honored by Google" is expected behavior. Google does not support crawl-delay in robots.txt and never has. Google manages crawl rate through its own algorithms. To adjust Google's crawl rate, use Google Search Console's crawl rate settings rather than robots.txt. For other crawlers that do support crawl-delay, verify you've added the directive to the correct user-agent section and that the crawler actually respects it.

"Google can't fetch my robots.txt file" prevents crawling entirely. Common causes include robots.txt returning error status codes (like 404, 500, or 503), server blocking Googlebot's user agent, robots.txt being too large (Google has limits around 500KB), or DNS or network issues preventing access to your server. Check Google Search Console for specific error messages about robots.txt fetching. Verify your robots.txt is accessible by visiting https://yourdomain.com/robots.txt in a browser. If you see it but Google can't, check server logs for Googlebot access attempts.

"Some URLs are blocked but others aren't when they should both match my rules" indicates path matching confusion. Remember that paths are case-sensitive: "Disallow: /Admin/" doesn't block "/admin/". Longer, more specific rules override shorter ones, so "Allow: /admin/public/" before "Disallow: /admin/" will allow the /admin/public/ subdirectory. Use Google Search Console's robots.txt tester to check whether specific URLs are blocked by your rules. Pay attention to trailing slashes, as "Disallow: /admin" (no trailing slash) blocks "/admin" and everything starting with "/admin", while patterns with trailing slashes only block directories.

"Crawlers are still accessing blocked pages" could mean several things. Well-behaved search engine crawlers should respect robots.txt within hours of seeing updates, but some crawlers ignore robots.txt entirely, especially scrapers and malicious bots. Robots.txt is a suggestion, not enforcement. For truly unwanted traffic, use server-level blocking (IP blocks, user-agent blocking, rate limiting) rather than relying solely on robots.txt. Check your server logs to identify which crawlers are ignoring your rules.

Frequently Asked Questions

What is a robots.txt file and what does it do?

A robots.txt file is a plain text file placed at the root of your website (https://yourdomain.com/robots.txt) that provides instructions to search engine crawlers and other bots about which pages or sections they should or shouldn't crawl. It follows the Robots Exclusion Protocol (REP), a standard defined in RFC 9309. Robots.txt controls crawling (access to pages), not indexing (appearance in search results). Well-behaved crawlers check robots.txt before accessing other pages on your site and follow the directives you specify. Common uses include blocking admin areas, managing crawl budget on large sites, preventing duplicate content crawling, and communicating sitemap locations.

Where do I put the robots.txt file on my website?

The robots.txt file must be placed at the root directory of your website domain and must be accessible at https://yourdomain.com/robots.txt (or http://yourdomain.com/robots.txt if using HTTP). It cannot be in a subdirectory like /admin/robots.txt or /content/robots.txt. The file must be named exactly "robots.txt" in lowercase. Each unique host, protocol, and port combination needs its own robots.txt file. For example, https://example.com, https://www.example.com, and http://example.com would each need separate robots.txt files at their respective roots since they're considered different hosts or protocols.

Does robots.txt prevent pages from being indexed by search engines?

No, robots.txt controls crawling (whether search engines can access and download pages), not indexing (whether pages appear in search results). Pages blocked by robots.txt may still appear in search results if external links point to them, though they'll typically show without descriptions or snippets. If you want to prevent pages from appearing in search results, use the noindex meta tag (<meta name="robots" content="noindex">) in the page's HTML or an X-Robots-Tag HTTP header. The noindex directive tells search engines to exclude the page from search results even if they crawl it.

How do I block a folder like /admin/ with robots.txt?

Add a Disallow directive with the folder path to your robots.txt file. Format: Disallow: /admin/. This blocks crawling of all URLs starting with /admin/, including all subdirectories and files within. The path is case-sensitive, so this won't block /Admin/ (capital A). The trailing slash indicates a directory. Example robots.txt: User-agent: * Disallow: /admin/ Sitemap: https://example.com/sitemap.xml. This tells all crawlers (*) not to access anything in the /admin/ directory. Remember this blocks crawling but doesn't prevent indexing or provide security protection. Use proper authentication for truly private admin areas.

How do Allow and Disallow directives work together?

When both Allow and Disallow rules could apply to a URL, the most specific (longest matching) rule takes precedence. Allow directives are primarily useful for permitting access to subdirectories within blocked parent directories. For example, if you have "Disallow: /admin/" but want to allow "/admin/public/", you add "Allow: /admin/public/" above the Disallow line. The more specific /admin/public/ path takes precedence over the less specific /admin/ path. Order matters when specificity is equal, with the most specific match winning regardless of Allow vs Disallow. When rules have equal specificity, the most permissive rule applies (Allow wins over Disallow).

How do I add my XML sitemap to robots.txt?

Add a Sitemap directive with the full URL to your sitemap. Format: Sitemap: https://example.com/sitemap.xml. Include the complete URL including https:// or http://. The Sitemap line can appear anywhere in the robots.txt file and applies to all user agents, not just the one above it. You can add multiple Sitemap lines if you have multiple sitemaps (like separate sitemaps for posts, pages, products, videos, etc.). Example: Sitemap: https://example.com/sitemap-posts.xml Sitemap: https://example.com/sitemap-pages.xml. While adding your sitemap to robots.txt helps search engines discover it, also submit sitemaps directly through Google Search Console and Bing Webmaster Tools for best results.

Does Google support crawl-delay in robots.txt?

No, Google does not support or honor the crawl-delay directive in robots.txt. Google's crawlers completely ignore crawl-delay and manage crawl rate through their own algorithms. To adjust how fast Google crawls your site, use Google Search Console's crawl rate settings instead. Some other search engines like Bing do respect crawl-delay, so including it may affect those crawlers but will have zero impact on Googlebot. If you're experiencing server load issues from Google's crawling, address it through Search Console settings or server capacity improvements, not robots.txt.

Can I use robots.txt to hide private or sensitive pages?

No, robots.txt is not a security mechanism and should never be used to protect truly private or sensitive content. Robots.txt files are publicly accessible to anyone, including malicious actors, and only provide guidance to well-behaved crawlers. Bad actors and malicious bots can and do ignore robots.txt entirely. Blocking pages in robots.txt may actually draw attention to them by publicly advertising their existence. For truly private content, use proper authentication (passwords, login requirements), server-level access controls, or don't put the content on public-facing web servers at all. Robots.txt is for managing crawler behavior, not enforcing security.

How do I test my robots.txt file in Google Search Console?

Log into Google Search Console, select your property, navigate to the "Indexing" section in the left sidebar, and click "robots.txt Tester" (it may be under "Legacy tools and reports" in newer Search Console versions). The tester displays your current robots.txt file and allows you to test whether specific URLs would be blocked by your rules. Enter any URL from your site in the test field and click Test to see if Googlebot would be allowed to crawl it. You can also edit the robots.txt directly in the tester to experiment with changes before deploying them. Additionally, use the URL Inspection tool to check crawl status of specific pages and see if robots.txt blocks are affecting them.

Why is my page still showing in Google search results after I added it to robots.txt?

Because robots.txt controls crawling, not indexing. When you block a URL with robots.txt, search engines can't crawl it to see what's there, but if the URL was already indexed or has external links pointing to it, it may remain in search results. The listing typically appears without a description, showing a message like "A description is not available because of this site's robots.txt." To actually remove pages from search results, use a noindex meta tag or X-Robots-Tag header, not robots.txt alone. The correct process: remove robots.txt blocks, add noindex to the page, wait for recrawling, then optionally re-block with robots.txt after the page is deindexed.

Should I block CSS and JavaScript files in robots.txt?

Generally no, you should not block CSS and JavaScript files that Google needs to render your pages properly. Historically, some sites blocked /wp-content/, /assets/, or /scripts/ directories thinking these weren't important for SEO. However, Google needs to load CSS and JavaScript to render pages as users see them, which affects Google's understanding of your content and user experience signals. Blocking rendering resources can cause Google to misinterpret your pages. Allow paths like /wp-content/themes/, /wp-content/plugins/, /assets/, /static/, and similar directories containing CSS, JS, and images. Block truly private areas like /admin/ or /private/ but allow resources needed for rendering.

Do I need a robots.txt file if my site is small?

Not necessarily. Many small sites operate perfectly fine without robots.txt files or with minimal robots.txt files that only specify sitemap locations. If you have no areas you need to block from crawling and search engines are crawling your site appropriately, a robots.txt file is optional. However, even small sites benefit from including a Sitemap directive in robots.txt to help search engines discover content efficiently. A simple robots.txt with just a sitemap line is perfectly valid and often sufficient for small sites. Complex robots.txt rules are more important for large sites with crawl budget concerns, extensive admin areas, or duplicate content issues from filtered URLs.

Daily Inspiration

The pen is mightier than the sword. - Edward Bulwer-Lytton

Tool Point

Free tools for everyday tasks, from quick text fixes to image edits, SEO checks, and calculators. No sign-up needed. Fast, private, and easy to use.

© 2026 Tool Point. All rights reserved.