Fix Blocked by robots.txt Issue in Google Search Console

How to Fix Blocked by robots.txt Issue in Google Search Console

What is robots.txt?

  • A text file located at the root of your website.
  • Instructs search engine crawlers like Googlebot which pages to crawl and index.
  • Uses directives like “User-agent,” “Disallow,” and “Allow” to control access.

Understanding Which URLs to Block

Many types of URLs don’t need to be indexed in search results and could be the reason for the waste of Google’s crawl budget. These often include:

  • Dynamic URLs: URLs with parameters like search terms, filters, or product variations.
  • Account-specific URLs: Pages related to user accounts, shopping carts, or checkout processes.

These URLs can be blocked from being indexed using a robots.txt file.

URLs Reported in Blocked by Robots.txt

When to Fix Blocked URLs

While it’s generally okay to block the above types of URLs, sometimes important pages might accidentally be blocked. This can negatively impact your website’s visibility in search results.

Steps to Fix the Issue

Identify the Affected URLs:

  • Use Google Search Console’s “Index Coverage” report to find pages with the “Blocked by robots.txt” error.
  • Inspect the affected URLs to determine if they should be accessible to Googlebot.

Access and Edit Your robots.txt File:

  • Use an FTP client or your website’s content management system (CMS) to access the file.
  • Make a backup copy of the file before making any changes.

Analyze the robots.txt File:

  • Check for directives that might be blocking Googlebot from accessing the affected URLs.
  • Common culprits include overly broad “Disallow” directives or incorrect syntax.

Make Necessary Changes:

  • If you want Googlebot to access the URLs, remove or modify the relevant “Disallow” directives.
  • For example, if the directive is Disallow: /products/, you can change it to Disallow: /products/sale/ to allow access to other product pages.
  • Ensure proper syntax and avoid using wildcard characters excessively.

Test Your Changes:

  • Use a robots.txt testing tool to verify that the changes have the desired effect.
  • Check if Googlebot can now access the previously blocked URLs.

Submit the Updated robots.txt File:

  • Save the modified robots.txt file to your website’s root.
  • Allow some time for Googlebot to re-crawl your website and update its index.

Additional Tips

  • Be Specific: Use precise directives to control access to specific pages or directories.
  • Avoid Overblocking: Blocking too many pages can negatively impact your website’s visibility.
  • Use “Allow” Directives: If you want to explicitly allow access to certain pages, use “Allow” directives.
  • Consider Using “Sitemap.xml”: Submit a sitemap to Google Search Console to help prioritize important pages.
  • Monitor for Changes: Regularly review your robots.txt file and Google Search Console to ensure it’s working as intended.

Example robots.txt:

  • User-agent: Googlebot
  • Disallow: /admin/
  • Disallow: /checkout/
  • Allow: /images/

This example allows Googlebot to access all pages except those in the “/admin/” and “/checkout/” directories. It also explicitly allows access to images.

Remember: Carefully review your website’s structure and goals before making changes to your robots.txt file. Incorrect modifications can have unintended consequences.

By following these steps, you can effectively fix the “Blocked by robots.txt” issue and improve your website’s visibility in search results.