Duplicate content

Hello, there is a duplication of blog posts at two urls with .html and without:
mysite/post1.html
mysite/post1

Its not good for Google. How can I fix that. In settings Append URL extension - empty

What is this bit of your question saying exactly? You’ve tried this? You’re wondering if it would work?

I’m talking about the fact that I have the same article available on two urls with .html on end and without .html
Its bad for seo.


Why I have two urls with .html and without if in settings I dont added .html ?

@NataliaB, There are even more urls pointing to the same content. Try:

https://mydomain/typography
https://mydomain/typography.html
https://mydomain/typography.htm
https://mydomain/typography.xml
https://mydomain/typography.txt
https://mydomain/typography.json
https://mydomain/typography.rss
https://mydomain/typography.atom

You are claiming this behaviour is bad for SEO, but I beg to differ. Yes, it isn’t optimal if Google finds multiple urls to the same content. However Google will not find duplicate urls…

  • Plugin Sitemap will only generate a single url for Google to index.
  • Google will not try all possible extension types on the url provided by the sitemap.
  • If you use <link rel="canonical" href="{{ page.url(true, true) }}" /> in your theme (see Quark base.html.twig), each url will return a page with the correct canonical telling Google which url is the preferred one.

So, I don’t really see a problem…

1 Like

I understood what the problem was but I asked “What is this bit of your question saying exactly?” and then showed you that bit. It can be difficult to provide help if your posts aren’t clear.

I agree with @pamtbaau though, there is no problem if you don’t share or link to the non-canonical URL. There is also the sitemap and <link rel="canonical"> tag in the HTML to specify your canonical URL, so it’s pretty clear to search engines.

You could probably put a rewrite rule in if you think Google is really that stupid (I don’t know, maybe it is??).

By the way, there are plenty of better reasons than SEO to have canonical URLs. Visited links and caching, for example. I think if you slavishly build your site for SEO, it might work for a while but not be as interesting or useful for humans. We’ve all seen those ridiculous recipe sites. (Sorry that turned into a mini rant. :slight_smile: )

1 Like

If you will decide to create mentioned rewrite rule - don’t forget that crawlers are looking for a robots.txt file

the funnest way to deal with .html you can see on grav root…

https://getgrav.org/index.html

at point 5 :crazy_face: :face_with_peeking_eye: :rofl: