How to add meta Robot tag in HTTP Header to specific urls with nginx?

Hello,

I have a website made with Grav and I would like to add HTTP header X-Robots-Tag noindex only to urls containing the pattern “category:”, like

https://example.com/blog/article/category:name-of-the-category or
https://example.com/blog/category:name-of-the-category

So i’ve tried to modify the file /etc/nginx/sites-available/grav in such a way with adding

location ~ .*category:.* {
            add_header X-Robots-Tag noindex ;
        }

inside or outside the location ~ .php$ {] bloc, without success.

I am not familiar with nginx or grav configuration files but i would like to learn. So i’ve made some researches.

I’ve read the nginx support pages for add_header, IfIsEvil and try-files pages and i still don’t manage to make it work

Can anyone give me a hint to this problem ?

I don’t want to use php inside a page itself or a twig file , especially because these pages are automatically generated (they are not in my pages dashboards).

Any ideas ?

Thanks i advance.

No solutions yet.

So ? Any ideas someone ?

You need to upload a robots.txt to a public_html in your server directory.
Generate your robots in:
http://tools.seobook.com/robots-txt/generator/

Hello,

Thank you for your response.
I think you may confuse crawling with indexing such urls from any search robots.
While the robots.txt file is here to prevent the robots from crawling desired urls, it does not tells them to un-index them from their databases (which is the result i want).

It is true that, in the past, the unsupported noindex directive worked in the robots.txt, but not anymore (see https://searchengineland.com/google-to-stop-supporting-noindex-directive-in-robots-txt-319003).

I want to allow access to those urls (with a 200 HTTP code) because they exist but i don’t want them to be indexed. This is why i am trying to send this noindex X-Robots-Tag in the HTTP header.

Configuration of the nginx config file seems tricky because the url exists but is “generated” by grav (or the grav theme i use) even though the file itself doesn’t exist explicitly in the page dashboard in the backend. They are “category” urls or “page collections”.

I am still stuck with that so if anyone visiting this post has an idea. Please share it with me (and others reading).

thank you.

Hello,

For those interested, i got helped by someone from the #nginx channel on Freenode IRC. Here is the solution that does the trick (wasn’t easy to find and is not trivial).

Solution: Robot tag in HTTP Header
    map $request_uri $x_robots {
      default "";
      ~category: noindex;
    }
     
    server {
        index index.html index.php;
     
        ## Begin - Server Info
        root /home/grav/www/html;
        server_name example.com www.example.com;
        ## End - Server Info
     
        ## Begin - Index
        # for subfolders, simply adjust:
        # location /subfolder {
        # and the rewrite to use /subfolder/index.php
        location / {
            try_files $uri $uri/ /index.php?$query_string;
        }
        ## End - Index
     
        ## Begin - Security
        # deny all direct access for these folders
        location ~* /(\.git|cache|bin|logs|backup|tests)/.*$ { return 403; }
        # deny running scripts inside core system folders
        location ~* /(system|vendor)/.*\.(txt|xml|md|html|yaml|yml|php|pl|py|cgi|twig|sh|bat)$ { return 403; }
        # deny running scripts inside user folder
        location ~* /user/.*\.(txt|md|yaml|yml|php|pl|py|cgi|twig|sh|bat)$ { return 403; }
        # deny access to specific files in the root folder
        location ~ /(LICENSE\.txt|composer\.lock|composer\.json|nginx\.conf|web\.config|htaccess\.txt|\.htaccess) { return 403; }
        ## End - Security
     
        add_header X-Robots-Tag $x_robots;
     
        ## Begin - PHP
        location ~ \.php$ {
           # fastcgi_index index.php;
            include snippets/fastcgi-php.conf;
     
            # Choose either a socket or TCP/IP address
            fastcgi_pass unix:/var/run/php/php7.3-fpm.sock;
     
            # legacy (e.g. PHP 5) logic
            # fastcgi_pass unix:/var/run/php5-fpm.sock;
            # fastcgi_pass 127.0.0.1:9000;
            # fastcgi_split_path_info ^(.+\.php)(/.+)$;
     
        }
        ## End - PHP
     
        listen [::]:443 ssl ipv6only=on; # managed by Certbot
        listen 443 ssl; # managed by Certbot
        ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem; # managed by Certbot
        ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem; # managed by Certbot
        include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
        ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
     
    }
     
    server {
        if ($host = www.example.com) {
            return 301 https://example.com$request_uri;
        } # managed by Certbot

        if ($host = example.com) {
            return 301 https://$host$request_uri;
        } # managed by Certbot
     
        listen 80;
        listen [::]:80;
        server_name example.com www.example.com;
        return 404; # managed by Certbot
    }

See lines 1 -> 4 and 34

it uses nginx map module and try_files

I hope this will help grav users that have the same request as mine.

@cequejevois, Thanks for sharing the solution to the community.

However, it seems that Framebin will shut its doors at Tuesday, July 6, 2021 and your solution will be lost.

Would you mind sharing the data in a bit more permanent way. E.g. using a github gist or inside your last post using:

[details="Solution: Robot tag in HTTP Header"]
This text will be hidden
[/details]

Which will look like:

Solution: Robot tag in HTTP Header

This text will be hidden

Thanks!

You’re right.

Done !