TNTsearch problem with multi-language indexes

Hey there Grav team,

I’ve taken over a Grav project and I’m very happy to be working with it. I first hadn’t even heard of Grav, and now slowly but steadily I’m figuring it all out.
Needless to say, I’m experiencing some bumps along the way.
One of these bumps is the multi-language support for the TNTsearch plugin for Grav.

I’ve started noticing new blog post not appearing in the searches.
So I used the CLI to reindex all the pages. Two things I noticed when doing that:

  1. It skips a lot of pages when indexing (it adds 228 out of 817 pages). This is for a big part due to a lot of pages being unpublished. But it skips a lot of published pages as well.
  2. The number of indexed pages is exactly the same for each language. So we are running a website in 3 languages: English, Dutch and German. And it builds an index for every language, which has exactly the same number of pages and exactly the same pages in them. Please read below.

When I var_dump the actual page objects being added to the index I notice that for the English index most but not all page objects are with language code ‘en-gb’. In the Dutch and German indexes there also appear a lot of pages objects with language code ‘en-gb’. So what happens is when indexing, for every folder in the ‘pages’ folder it takes the first markdown file it sees (which in most cases is page.en-gb.md) and adds it to the index. This goes for the English index, but also for the Dutch and German indexes. This leads to every index being build up with exactly the same pages.

If for instance the first markdown file it happens to find has a published: false declaration in the front-matter, it skips that page, even if the markdown files in other languages are published.

When I dove into the code I found that the a collection of pages is made for each index with the ‘collection’ function in /Grav/Common/Page/Page.php: 2675. These collections aren’t checked for page language as far as I can see.

I hope I didn’t miss something. I’m not a backender by profession, but I tried checking this thoroughly. This is the contents of my tntsearch.yaml file:

enabled: true
search_route: /search
query_route: /s
built_in_css: false
built_in_js: false
built_in_search_page: false
enable_admin_page_events: false
search_type: basic
fuzzy: true
phrases: true
stemmer: default
display_route: false
display_hits: true
display_time: false
live_uri_update: false
limit: '30'
min: '3'
snippet: '250'
index_page_by_default: true
scheduled_index:
  enabled: true
  at: '0 * * * *'
  logs: logs/tntsearch-index.out
filter:
  items:
    - root@.descendants
powered_by: false
search_object_type: Grav

I hope I can get some help with this, cause I’ve depleted my knowledge of how to solve the problem. Please let me know if you need more information, or if I’ve missed something a nudge in the right direction.

With kind regards,

Jonathan