I don’t want to use php inside a page itself or a twig file , especially because these pages are automatically generated (they are not in my pages dashboards).
Thank you for your response.
I think you may confuse crawling with indexing such urls from any search robots.
While the robots.txt file is here to prevent the robots from crawling desired urls, it does not tells them to un-index them from their databases (which is the result i want).
I want to allow access to those urls (with a 200 HTTP code) because they exist but i don’t want them to be indexed. This is why i am trying to send this noindex X-Robots-Tag in the HTTP header.
Configuration of the nginx config file seems tricky because the url exists but is “generated” by grav (or the grav theme i use) even though the file itself doesn’t exist explicitly in the page dashboard in the backend. They are “category” urls or “page collections”.
I am still stuck with that so if anyone visiting this post has an idea. Please share it with me (and others reading).
For those interested, i got helped by someone from the #nginx channel on Freenode IRC. Here is the solution that does the trick (wasn’t easy to find and is not trivial).
Solution: Robot tag in HTTP Header
map $request_uri $x_robots {
default "";
~category: noindex;
}
server {
index index.html index.php;
## Begin - Server Info
root /home/grav/www/html;
server_name example.com www.example.com;
## End - Server Info
## Begin - Index
# for subfolders, simply adjust:
# location /subfolder {
# and the rewrite to use /subfolder/index.php
location / {
try_files $uri $uri/ /index.php?$query_string;
}
## End - Index
## Begin - Security
# deny all direct access for these folders
location ~* /(\.git|cache|bin|logs|backup|tests)/.*$ { return 403; }
# deny running scripts inside core system folders
location ~* /(system|vendor)/.*\.(txt|xml|md|html|yaml|yml|php|pl|py|cgi|twig|sh|bat)$ { return 403; }
# deny running scripts inside user folder
location ~* /user/.*\.(txt|md|yaml|yml|php|pl|py|cgi|twig|sh|bat)$ { return 403; }
# deny access to specific files in the root folder
location ~ /(LICENSE\.txt|composer\.lock|composer\.json|nginx\.conf|web\.config|htaccess\.txt|\.htaccess) { return 403; }
## End - Security
add_header X-Robots-Tag $x_robots;
## Begin - PHP
location ~ \.php$ {
# fastcgi_index index.php;
include snippets/fastcgi-php.conf;
# Choose either a socket or TCP/IP address
fastcgi_pass unix:/var/run/php/php7.3-fpm.sock;
# legacy (e.g. PHP 5) logic
# fastcgi_pass unix:/var/run/php5-fpm.sock;
# fastcgi_pass 127.0.0.1:9000;
# fastcgi_split_path_info ^(.+\.php)(/.+)$;
}
## End - PHP
listen [::]:443 ssl ipv6only=on; # managed by Certbot
listen 443 ssl; # managed by Certbot
ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem; # managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem; # managed by Certbot
include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}
server {
if ($host = www.example.com) {
return 301 https://example.com$request_uri;
} # managed by Certbot
if ($host = example.com) {
return 301 https://$host$request_uri;
} # managed by Certbot
listen 80;
listen [::]:80;
server_name example.com www.example.com;
return 404; # managed by Certbot
}