Google’s John Mueller answered a mobility astir the funny condition of Search Console reporting thousands of URLs arsenic indexed contempt being blocked by robots.txt. Mueller helped explicate really this happens and what to do astir it.
Content Indexed Despite Being Blocked By Robots.txt
A Redditor asked for proposal because Google Search Console was reporting much than 51,000 pages nether the position “Indexed, though blocked by robots.txt.” The affected URLs were chiefly WooCommerce merchandise URLs containing add-to-cart URL parameters for illustration “?add-to-cart=”.
Because the rumor appeared suddenly, the tract proprietor questioned whether the robots.txt rules themselves were responsible for creating the problem. They besides wanted to cognize whether removing the rules would thief Google process the canonical signals and destruct the reported URLs from Search Console.
The personification asked:
“I person WooCommerce tract and abruptly since past period we are facing this issue: “Indexed, though blocked by robots.txt”
there are full “Affected pages 51K pages”
in the extremity of url I spot mostly ?page&post_type=product&product=slug&add-to-cart=98063,
After inspecting those urls I recovered they person scale tag setup and robots.txt had
* Disallow: /*?add-to-cart=
* Disallow: /*?*add-to-cart=
I removed those 2 rules from robots.txt and hoping those pages fixed origin they person canonical group to correct product, will that hole issue?
or should I besides setup noindex rules? will that origin america our crawl budget? it is beautiful large woocommerce site, fto maine cognize guys your thoughts if personification has acquisition fixing specified issue? and what will beryllium the correct method without preventing our SEO aliases functionality loss.”
Google Says Add-To-Cart URLs Don’t Need To Be Indexed
Mueller responded that the add-to-cart URLs do not request to beryllium indexed and that blocking them done robots.txt is an acceptable approach.
He explained that moreover erstwhile Google reports those URLs arsenic indexed, they are improbable to look successful normal hunt results because they are blocked by robots.txt. According to Mueller, users mostly do not hunt for those URLs directly, making them mediocre candidates for hunt visibility.
John Mueller responded:
“You don’t request the add-to-cart URLs indexed. Blocking them pinch robots.txt is fine. Even if they get “indexed” since they’re blocked by robots.txt, it’s improbable that they’ll beryllium shown successful hunt (unless you do circumstantial queries for those URLs, which users don’t do).”
I’m benignant of connected the obstruction astir what Mueller said astir “robots.txt” making it “unlikely” that the URLs will beryllium shown successful Search. The logic is because robots.txt does not forestall a web page from showing successful Google Search. It conscionable prevents Googlebot from crawling those pages. So technically, that’s not rather correct and I’m a small amazed Mueller would opportunity that.
Noindex Is Probably Not A Solution
One of the Redditors who responded to that mobility suggested the solution of adding a noindex robots tag to the parameterized URLs. But that whitethorn not beryllium a viable solution because the pages pinch and without the URL parameters are fundamentally the aforesaid thing. They’re rendered utilizing the aforesaid template for a circumstantial page. So unless WooCommerce treats them otherwise and tin render the parameterized URLs pinch a noindex and the regular page without the noindex, that’s not a existent solution.
Why Google Reports Indexed URLs That It Can’t Crawl
Another Redditor offered a imaginable mentation for why truthful galore URLs appeared successful Search Console. They suggested that Google apt discovered links containing the add-to-cart parameters location connected the tract and added those URLs to its systems.
My proposal for the personification who primitively asked that mobility is to crawl the website pinch Screaming Frog, reappraisal the soul linking to place wherever those pages are being linked from, and past return immoderate action, for illustration removing those links aliases adding a rel=”nofollow” nexus property to them.
Likely, the champion solution is to usage the robots.txt artifact to forestall crawling, arsenic agelong arsenic it’s understood that this is each it does. If the personification wants to beryllium other sure, they tin besides place wherever those links beryllium and past adhd the nofollow nexus property arsenic an other layer, a hint to Google. Nofollow is not a directive, but it is simply a beardown hint.
Search Console Warnings Don’t Always Indicate A Search Problem
One of the recurring challenges pinch Search Console reports is that they tin expose method conditions that look distressing but really person small to zero effect connected hunt performance. For example, the 404 correction reports are useful for a assortment of reasons, but galore times a 404 server consequence is the correct response, and it’s not really an “error” that needs fixing.
Takeaway
Mueller’s consequence reinforces the takeaway that not each Search Console informing requires taking action to hole something, though successful this circumstantial lawsuit location whitethorn beryllium thing to hole successful the shape of soul links to webpages that usage the shopping cart URL parameters. If those links pinch the shopping cart URL parameters are perfectly necessary, past utilizing a rel=”nofollow” nexus property will springiness Google a beardown hint not to travel that link. The joyousness of method SEO!
Featured Image by Shutterstock/Orange Line Media
English (US) ·
Indonesian (ID) ·