robots.txt

**alemoppo** · 02-08-2020, 01:23 PM

Originally Posted by msp

Really? I thought that the robots.txt tells a search engine indexing crawler which URL's on a site it is allowed to crawl

Please read this page.

is this any better?

http://tinyurl.com/srklav5

Can you please provide the whole page? I don't see necessary information like url.

My Italian is limited to "mama mia"

ok, but your site is in italian, so I guess you're working for others people, no problem

.

Bye!

**msp** · 02-08-2020, 03:17 PM

The message from the search console is related to the link I provided earlier: https://blog.pianetadonna.it/msp/zup...nale-tavola-2/

Regarding the general use of the robots.txt: on the page you want me to read (https://support.google.com/webmaster.../6062608?hl=en) it states:

What is robots.txt used for? robots.txt is used primarily to manage crawler traffic to your site.

This seems to support my suggestion that the robots.txt can be used to direct the search engine indexing crawlers.

I think you are quite expert on this subject. For less experienced people I like to suggest reading this thread: https://yoast.com/ultimate-guide-robots-txt/ for reading. As Google for obvious reasons is very unclear about their indexing this text provides some useful insights.

Thanx,
Gert

**alemoppo** · 02-08-2020, 04:30 PM

Originally Posted by msp

The message from the search console is related to the link I provided earlier: https://blog.pianetadonna.it/msp/zup...nale-tavola-2/

I'm sorry but i need the whole screenshot because i have to show it to AlterVista technicians. Could you please provide it?
Thank you.

Bye!

**msp** · 02-08-2020, 05:01 PM

I hope this one will do the trick:

http://tinyurl.com/uznl9mj

**alemoppo** · 02-09-2020, 11:22 AM

Originally Posted by msp

http://tinyurl.com/uznl9mj

Thank you.
The https://blog.pianetadonna.it/msp/zup...nale-tavola-2/ seems not to be an article (i see a 301 redirect to an image).
I don't see any reference to the "robots.txt" file on the screen, simply the url cannot be indexed because the article does not exist.

Do you have any redirects set by plugins like "Redirection"?

Bye!

**msp** · 02-09-2020, 03:44 PM

Hi,

The problem I try to solve has to do with images and attachments. For now I like to concentrate on the images. Like I said, in the Search Console I have thousands of crawl anomalies with the message "blocked by robots.txt". However, as we established earlier, I cannot have a working robots.txt in a subdirectory of the pianetadonna platform.

http://tinyurl.com/qwyr9fx
http://tinyurl.com/r8n4e4u
http://tinyurl.com/wz2dw94
http://tinyurl.com/saztr2s
http://tinyurl.com/t7c3vaz

And of course I have redirected URL's. None of these redirections have anything to do with images. But even if I did that would not explain the "blocked by robots.txt" message I am getting.

Thanx,
Gert

**alemoppo** · 02-11-2020, 10:11 PM

Hello, this is an incorrect message from the Google Search Console: the URL inspection tool only provides information regarding indexing for web pages and not images.

This behavior occurs on any site, even on other hosting.
It was confirmed unofficially on Twitter by a Google employee: https://twitter.com/JohnMu/status/1129304751160610816

Bye!

**msp** · 02-12-2020, 09:01 AM

I am officially impressed for you to come up with some obscure message like that

.

While not solving the problem it at least sets my mind at ease.

Thanx,
Gert

**msp** · 02-12-2020, 09:13 AM

That leaves us with my questions regarding the working (or rather not working) of the robots.txt on the pianetadonna platform.

We established that the sites on the pianetadonna platform cannot use the robots.txt as the robots.txt should be placed in the root directory of the site. For the sites of the pianetadonna platform the root is the blog.pianetadonna.it directory. Therefore a robots.txt placed in the blog.pianetadonna.it/mysite.it directory will not work.

Do you know if there is any other way for me to manage search engine crawler traffic?

**alemoppo** · 02-12-2020, 09:42 PM

Exactly, you can't use the robots.txt file.

According to this page, if you want hide the content from the search engine, you have to use the noindex directives.

To keep a web page out of Google, you should use noindex directives

Normally, you don't need to use the robots.txt file. What did you want to do in particular?

Bye!

Thread: robots.txt

LinkBack

Thread Tools

Display

Posting Permissions