500 Error Response for CDN Robots.txt Can Cause Issues

500 Response on Robots.txt Fetch Can create Rich Results

Google’s John Mueller received feedback about a bug in how Search Console validates rich results. Google will drop images from rich results because of an error in how a CDN that hosts the images handles a request for a non-existent robots.txt. The bug that was discovered was in how search console and Google’s rich results test will fail to alert the publisher of the error and subsequently give the structured data a successful validation.

500 response to a robots.txt fetch causes loss of rich results and while a bug in search console fails to diagnose the issue

Search engine journal.

When a software program performs in an unexpected way, it is referred to as a bug in programming. A bug isn’t always a code issue; it could also be a failure to anticipate an issue, which leads to unexpected consequences, such as this one.The publisher who asked the inquiry attempted to diagnose the cause of their rich results disappearing using Google’s tools, but was startled to discover that they were useless in this case.While this problem was affecting the image preview for recipe rich results in Google’s recipe rich results, it could possibly be an issue in other instances.As a result, it’s important to be aware of this issue because it may arise.

Recipe Rich Results Image Previews Disappeared.

The individual asking the question provided a background of what happened.

He narrated what happened:

“We ran into a bit of a tiger trap, I would say, in terms of rich recipe results.

We have hundreds of thousands of recipes which are indexed and there’s lots of traffic coming through from the recipe gallery.

And then… over a period of time it stopped.

And all of the meta data checked out and Google search Console was saying …this is all rich recipe content, it’s all good, it can be shown.

We finally noticed that in the preview, when you preview the result, the image was missing.

And it seems that there was a change at Google and that if a robots.txt was required in order for images to be retrieved, then nothing we could see in the tools was actually saying anything was invalid.

And so it’s a bit awkward right, when you check something to say “is this a valid rich recipe result?” and it says yea, it’s great, it’s absolutely great, we’ve got all the metadata.

And you check all the URLs and all the images are right, but it turns out behind the scenes, there was a new requirement that you have a robots.txt.”

John Mueller asked:

“How do you mean that you had to have a robots.txt?”

The person asking the question responded:

What we found is, if you requested the robots.txt from our CDN, it gave you like a 500.

When we put a robots.txt there, immediately the previews started appearing correctly.

And that involves crawling and putting it onto a static site, I think.

So we operationally, we found adding that robots.txt did the job.”

Mueller nodded his head and said:

“Yeah, okay.

So from our point of view, it’s not that a robots.txt file is required. But it has to have a proper result code.

So if you don’t have on, it should return 404.

If you do have one, then we can obviously read that.

But if you return a server error for the robots.txt file, then our systems will assume that maybe there is an issue with the server and we won’t crawl.

And that’s kind of something that’s been like that since the beginning.

But these kinds of issues where especially when you are on a CDN and it’s on a separate hostname, sometimes that’s really hard to spot.

And I imaging the rich results test, at least as far as I know, it focuses on the content that is on the HTML page.

So the JSON-LD markup that you have there, it probably doesn’t check to see if the images are actually fetchable.

And then if they can’t be fetched then, of course, we can’t use them in the carousel, too.

So that might be something that we need to figure out how to highlight better.”

-John Mueller

500 Error Response for CDN Robots.txt Can Cause Issues

This is one of those show stopping SEO problems that are hard to diagnose but can cause a lot of negative issues as the person asking the question noted.Normally a crawl for a robots.txt that is non-existent should result in a server response code of 404, which means that the robots.txt does not exist.So if the request for a robots.txt file is generating a 500 response code then that’s an indication that something on the server or the CMS is misconfigured.The short term solution is to upload a robots.txt file.But it might be a good idea to dive into the CMS or server to check what the underlying issue is.

Code for a Robots.txt Fetch

A 500 server error response code sometimes happens when there is something unexpected or missing in the code and the server responds by ending the code processing and throwing the 500 response code.For example, if you edit a PHP file and forget to indicate the end of a section of code then that might cause the server to give up processing the code and throw a 500 response.

Whatever the reason for the error response when Google tried to fetch the robots.txt, this is a good issue to keep in mind for that rare situation when it happens to you.

Leave a Comment

Your email address will not be published. Required fields are marked *