-
Among other things I want to prevent the crawling of feeds and searches. Also we recently decided to remove all tags from our pages. Crawlers tend to have long memories so they keep on trying to crawl now not existing pages and I want to prevent that too.
In most of these cases the URL's refer to another page (the page without /feed or the searched page) or to pages that do not exist (the tag pages). I do not think that in these cases the noindex directive can be used.
In any case the use of the noindex directive seems to be unpractical as it seems to involve a lot of manual HTML coding.
-
Check in the Yoast SEO options: for example try to disable indexing of tags from SEO -> search apperance -> Taxonomies -> Tags.
Bye!
-
That only works if you do have tags but do not want them to be indexed. It does not work if you do not have tags (anymore) and do not want the crawlers to look for those none existing tags.
-
What if I use the .htaccess to do a rewrite in order to redirect the crawlers to the robots.txt in the blog.pianetadonna.it/mysite/ directory. Do you think that could work?
-
The .htaccess file is self-generated and is restored with each update.
In any case, that rule should be written in the root .htaccess, where you don't have access.
Bye!
-
Any other suggestions then?
-
I can't understand why this solution is not working: the crawler will not index the tags; if you don't have tags, the crawlers will not index them.
Bye!
-
From what I read the Yoast solution is focused on the use of the XML sitemap.
To start with, we do not use an XML sitemap for the tags. So using the Yoast SEO plugin to prevent the use of a sitemap is not really useful.
Second, without using a sitemap, all kind of crawlers still find it necessary to crawl the tag pages. As we took the tags off we do not have any tag pages. And therefore crawlers try to crawl none existing pages over and over again. We would like to prevent that.
I do not think that we can achieve that by using or not using a sitemap. So unless that Yoast Seo plugin uses another trick to prevent the indexing that I am not aware of I do not think it can provide a solution for our problem.
So, summarizing:
The most straightforward solution seemed to be the exclusion of tags in the robots.txt file as this is the standard (if maybe a bit outdated) way of directing crawlers. But you already pointed out that this is not an option as your setup of the pianetadonna platform prevents the proper working of the robots.txt file.
Alternatively you could start think about a noindex tag. If you want to place this noindex tag in the header of the individual pages you encounter the problem that the tag pages do not exist anymore. So that is a no no.
So that is why we started thing about doing a rewrite using the .htaccess file. But you pointed out that, again due to the setup of the pianetadonna platform this will not work as the rewrite has to be done in the .htaccess file in the root of the pianetadonna platform. And like you pointed out, we do not have access to that.
Any other suggestion?