robots.txt

**msp** · 02-04-2020, 08:14 AM

Hi,

I like some information about the working of the robots.txt file on the pianetadonna platform. There are two things I notice that makes me wonder.

The first thing is that the entries in my robots.txt folder do not seem to work all that well. The entry "Disallow: /search*" should prevent the indexing of search-URL's but according to the Google search console the Googlebot still tries to index these search-URL's. Another similar example would be the entry "Disallow: /*feed*" which should prevent the indexing of all feed-URL's.

I wonder if this could be due to a syntax error in my entries. Alternatively the my question would be if the robots.txt file is working properly.

The second thing I noted is that I get a lot (thousands) of URL exclusions reported by the Google search console. The reason for these exclusions is "Blocked by robots.txt". If I try to inquire further the Google search console states that I do not have a robots.txt. I do have a robots.txt. However, in this robots.txt I do not exclude any pictures.

I wonder if this could be an error in the Google search console reporting. Meaning, maybe the indexing of the pictures is blocked by something else then the robots.txt while the search console reports it as a robots.txt block anyway. Alternatively I wonder if there is a robots.txt file on a higher level on the pianetadonna platform that blocks the indexing of pictures.

I hope you can help me out.

Thanx,
Gert

**alemoppo** · 02-04-2020, 09:42 PM

Hello, can you provide an example URL of picture blocked by the robots.txt?

Bye!

**msp** · 02-05-2020, 05:28 AM

https://blog.pianetadonna.it/msp/zup...nale-tavola-2/

**alemoppo** · 02-06-2020, 07:07 PM

The robots.txt file is ignored in the subdirectories . Your robots.txt file is not used by Google to index content. Can you provide a screenshot with error messages to better understand the problem?

Bye!

**msp** · 02-06-2020, 07:54 PM

Do I understand correctly that you setup the pianetadonna platform in such a way that the websites running on this platform are positioned as a subdirectory instead of as a root directory? And that as a direct result of this setup users of the pianetadonna platform (and other Altervista platforms) cannot use the robots.txt file to direct the search engine indexing crawlers? This could potentially harm search engine positioning and with that the revenues from advertisements. Do you have any workarounds for this little but quite annoying problem?

As regard to the second part of my question: http://tinyurl.com/us8o5fw

Thanx,
Gert

**alemoppo** · 02-07-2020, 10:02 PM

Originally Posted by msp

Do I understand correctly that you setup the pianetadonna platform in such a way that the websites running on this platform are positioned as a subdirectory instead of as a root directory?

Your site is: https://blog.pianetadonna.it/msp/

Originally Posted by msp

And that as a direct result of this setup users of the pianetadonna platform (and other Altervista platforms) cannot use the robots.txt file to direct the search engine indexing crawlers? This could potentially harm search engine positioning and with that the revenues from advertisements. Do you have any workarounds for this little but quite annoying problem?

The robots.txt file is not used to index the site. How it could potentially harm search engine positioning and the revenues? Please read this page.
In the "AlterVista platforms" you can use the robots.txt because the url is like "yoursite.altervista.org".

The second thing I noted is that I get a lot (thousands) of URL exclusions reported by the Google search console. The reason for these exclusions is "Blocked by robots.txt".

As regard to the second part of my question: http://tinyurl.com/us8o5fw

I'm sorry but i didn't see the "Blocked by robots.txt" string in your image.
I don't recommend using wordfence or other security plugins because they can damage your blog and are useless on AlterVista.

p.s: i just noticed that your site is in italian, why you don't ask on the italian forum?

Bye!

**msp** · 02-08-2020, 09:23 AM

is this any better?

http://tinyurl.com/srklav5

**msp** · 02-08-2020, 09:29 AM

Do I understand correctly that you setup the pianetadonna platform in such a way that the websites running on this platform are positioned as a subdirectory instead of as a root directory? And that as a direct result of this setup users of the pianetadonna platform (and other Altervista platforms) cannot use the robots.txt file to direct the search engine indexing crawlers?

So this statement seems to be correct.

Which leaves the question:

Do you have any workarounds for this little but quite annoying problem?

**msp** · 02-08-2020, 09:42 AM

The robots.txt file is not used to index the site.

Really? I thought that the robots.txt tells a search engine indexing crawler which URL's on a site it is allowed to crawl and which one it is not. With a working robots.txt you could block crawlers from indexing your site altogether. If the crawler is not disregarding the robots.txt of course. Or so I thought. You think this is not true?

**msp** · 02-08-2020, 09:45 AM

p.s. Do you think I could ask questions on the Italian forum in English? My Italian is limited to "mama mia" and some poetic swearing so that would hamper the Italian conversation quite a bit.

Thread: robots.txt

LinkBack

Thread Tools

Display

robots.txt

Posting Permissions