“Indexed, though blocked by robots.txt” indicates that your website’s robots.txt file prevented Google from crawling certain URLs.
Most of the time, this will be a simple problem caused by your robots.txt file blocking crawling.
However, there are a few additional conditions that could cause the issue; therefore, in order to efficiently diagnose and resolve the issue, let’s go through the troubleshooting procedure that follows:
Add a noindex meta robots tag to the URL to prevent it from being indexed and check to allow crawling—assuming it is canonical.
Google may still index a page that you block from being crawled because crawling and indexing are distinct processes.
Google won’t see the noindex meta tag on a page unless they can crawl it, but they might still index it because it has links.
A noindex meta robots tag should not be included if the URL canonicalizes to another page. Allow crawling and the proper canonicalization signals to pass and consolidate, including a canonical tag on the canonical page.
You must determine why Google cannot crawl the URL and remove the block if you want the URL to be indexed.The most probable reason is a slither block in robots.txt.
However, there are a few additional situations in which you might receive messages stating that you are blocked. Let’s go over these in the likely order you should look for them.
The robots.txt tester in GSC, which will flag the blocking rule, is the easiest way to identify the issue. Check for crawl blocks in robots.txt. Check for intermittent blocks.
Check for user-agent and IP blocks. Check for crawl blocks in robots.txt.You can locate the file at domain.com/robots.txt if you know what you want or do not have access to GSC.
Our robots.txt article has more details, but you probably want a disallow statement like:Abstain: /It could block everyone or mention a specific user agent.
If your website is brand-new or just launched, you should look for:
Agent-user: *Abstain: /Can’t find a problem?
It’s conceivable that somebody previously fixed the robots.txt block and settled the issue before you’re investigating the issue.
That is the ideal situation. However, you may have an intermittent block if the issue appears to be resolved but returns shortly thereafter.
How to fix It is necessary to get rid of the disallow statement that is causing the block.
The technology you use determines how you go about doing this.WordPress If the problem affects your entire website, you probably checked a setting in WordPress that prevents indexing.
This error is common on new websites and websites that have been moved.
Follow these moves toward check for it:
Web index Perceivability is uncontrolled.
WordPress with Yoast To remove the blocking statement, you can directly edit the robots.txt file with the Yoast SEO plugin.Click Tools, Yoast SEO, and File Editor, WordPress with Rank Math. Like Yoast, Rank Math lets you directly edit the robots.txt file.
Edit robots.txt FTP or hosting FTP or hosting If you have FTP access to the site, you can directly edit the robots.txt file to remove the disallow statement that is causing the problem.
Click Rank Math. Click General Settings. Click Edit robots.txt You might also get access to a File Manager from your hosting provider that lets you directly access the robots.txt file.
Check for intermittent blocks Because the conditions that are causing the block may not always be present, intermittent issues can be more difficult to resolve.
What I’d suggest is really taking a look at the historical backdrop of your robots.txt document. For instance, the GSC robots.txt tester has a dropdown menu that lets you select previous versions of the file and view their contents.
The robots.txt files for the websites they crawl can also be viewed in the history of the Wayback Machine on archive.org. You can see what the file contained on any given day by clicking on any of the dates for which they have data.
Alternately, you can make use of the Changes report’s beta version, which lets you quickly and easily compare two versions’ content.How to fix It depends on what is causing the problem how to fix intermittent blocks.
A shared cache between a test environment and a live environment, for instance, is one possibility. The robots.txt file may contain a blocking directive when the cache from the test environment is active.
Additionally, the site might be crawlable when the live environment’s cache is active. In this scenario, you should either split the cache or remove.txt files from the test environment’s cache.
Check for user-agent blocks When a site blocks a particular user agent, such as Googlebot or AhrefsBot, it is called a “user-agent block.” To put it another way, the website is blocking the user agent that is associated with a particular bot.
If changing your user-agent allows you to view a page normally but blocks you from doing so, this indicates that the user-agent you entered is blocked.
How to fix it
Unfortunately, you won’t know how to fix this one unless you know where the block is. .htaccess, server configuration, firewalls, CDN, and even something your hosting provider controls that you may not be able to see can all block bots.
Contacting your CDN or hosting provider to inquire about the source of the block and the best course of action might be your best option.
Check for IP blocks If you have verified that you are not blocked by robots.txt and have ruled out user-agent blocks, then an IP block is probably the problem.
Finding solutions to IP block issues is difficult. Similar to user-agent blocks, your best bet may be to contact your CDN or hosting provider and inquire about the source of the block and its resolution.