Copied website not indexed by Google

MJR · June 14, 2022, 3:57am

Hi all,

I am new to discourse but have been using grav now since a few years. Thanks for the great work here.

Some site I have up and running is nice and I want to reuse its theme and some of its content.
I thought doing a backup to copy the backup over to a new server (extract it there) and make it visible under another URL, modify content and pictures and done.

But here is my question:

A few months now but the new website (derived from the backup) does still not show up on google, even when I search for the exact URL in google it is not found. Why is that? Is a backup from one site not suited to use as a starting point for another website? The old website from which I derived the backup is well indexed in google.

I have some ideas why the backup might not be ideal as a starting point for a new website but couldn’t quite figure out why this prevents a derived site to be unique and visible to google.

Maybe some unique identifieres (like security tokens) have to be recreated?

Did anyone of you try a similar approach to reuse an grav installation?

Or does anyone here have a suggestion how to edit the new website so that it becomes an indepentent and visible website?

Thanks for any hints.

Karmalakas · June 15, 2022, 8:57am

Do you have different robots.txt on a new site maybe?

pamtbaau · June 15, 2022, 9:29am

@MJR, Google only indexes sites it can find…

Is there any other indexed site that links to your site?
Did you submit your site to Google?

And there may be other reasons why Google doesn’t index it

Google might consider the website to be not relevant or trustworthy.
- Is the content different enough from the original website?
- Does it have a different sitemap?
- Does it…
Has the Google bot been blocked in robots.txt
Have your pages been set to noindex?
Other reasons…?

MJR · June 17, 2022, 8:06am

Hi @Karmalakas,

thanks for the hint to robots.txt. Actually I didn’t change the robots.txt. Here is the content:

User-agent: *
Disallow: /backup/
Disallow: /bin/
Disallow: /cache/
Disallow: /grav/
Disallow: /logs/
Disallow: /system/
Disallow: /vendor/
Disallow: /user/
Allow: /user/pages/
Allow: /user/themes/
Allow: /user/images/
Allow: /
Allow: *.css$
Allow: *.js$
Allow: /system/*.js$

which is the default grav robots.txt.

That should not cause the problem then, right?

MJR · June 17, 2022, 8:25am

Hi @pamtbaau,

I actually have two sites that I forked from the backup of the original site.

I’ve submitted both of them for google to index a few days ago.

One of them was positivly indexed and also has some other indexed site that links to it. Here the problem seems to be solved by submission to google by the google search console.

However, for the second site, google shows an indexing-error: ‘redirection error’.

Here robots.txt is the unchanged grav default. So no problem there.

But google explains that redirection errors can be caused by

A redirect chain that was too long
A redirect loop
A redirect URL that eventually exceeded the max URL length
A bad or empty URL in the redirect chain

But since I just copied the indexed site (by the backup, see my explaination in my first post) I can’t imagin what redirection errors I introduced?

The .htaccess of both the original site and the copied site are also grav default.

One thing to mention maybe is, that the problematic site is on a sub-domain:

sitecopy.website.com, where website.de is a fully indexed site.

Do you have any ideas what the redirection errors mean?

pamtbaau · June 17, 2022, 8:55am

@MJR, OK, the initial issue seems to be solved. Please mark the issue as such.

I don’t think the redirect error has anything to do with Grav as long as its defaults are being used.

Google gives a few reasons what could have gone wrong. You will have to inspect the urls in all your sites. You might find some directions/suggestions on the web about tackling these issues. You’re not the first…

MJR · June 17, 2022, 10:58am

Just for the sake of completeness in this thread I am looking into the http → https redirection of my hosting. This may cause some so-called redirection loop or chain that google complains about.

Actually, in the time searching for a solution to the problem of not showing up on google, the grav option “force ssl” may have an impact on a valid redirection chain. I can’t explain the details, as I don’t know enough about it. But the hosting service is probably doing a redirect automatically?

Meanwhile I try the setting of “force-ssl” in grav and see if this changes the judgement of google.

hughbris · June 17, 2022, 7:44pm

You need to ask your host. That would be a sane thing to do by default. Commodity hosts often don’t do sensible things though.

Topic		Replies	Views
Problem with search engines Archive	3	322	June 13, 2016
Copied site shows images grey Archive	2	239	November 13, 2014
Robots.txt file Archive	6	1840	November 3, 2014
Alternate page with proper canonical tag Support first-time	1	317	April 5, 2023
Copy Entire Site Archive	3	665	September 30, 2016

Copied website not indexed by Google

Related topics