Grav Site Limits

Hi. It’s 2023, and I’m re-inquiring about limits/memory requirements of Grav. It was previously discussed in 2016 (max ~5000 pages, comfortable ~1000) at Practical Site Limits With Grav and
in 2019 at Grav page limit? (no follow up).

I have not bothered to alter the default php memory limit of 128 MB (I think). Don’t want to, really, unless it is only a minor tweak.

I’d like to host a site with 100,000 pages. Is it possible? When trying to load a new site (previously working) with 32,000 pages/folders, I’m seeing
Fatal error : Allowed memory size of 134217728 bytes exhausted (tried to allocate 12288 bytes) in "D;.…
vendor\rockettheme\toolbox\File\src\AbstractFile.php on line 264 and
(tried to allocate 20480 bytes) in
vendor\filp\whoops\src\Whoops\Util\Misc.php on line 50

I assume Grav is somehow loading up a file system image on start. Is that correct? If it is, I can see how it might have an upper bound limit if it is always traversing the site.

A description of startup memory usage/requirements would be helpful. Anyone with a link? Perhaps I could dive into lightening the load.

I’d like to at least get the site to load.

Are there system vars that can be tweaked to allow loading.
E.g.:
Is the cache initialized with file system structure, and can it be disabled?
Is there a way to disable cache at pages/folder level via frontmatter yaml?
Is all Frontmatter read at startup?

I would even go so far as to comment out (or conditionally define) non-critical calls in base system classes/code to track down/modify usage or requirements. Have debugger attached. Hints appreciated :slight_smile:

Well, perplexing as to why it is necessary, but it appears the entire user directory of pages is basically parsed and loaded up into memory at startup in Page.init(), recursively called from Pages.init(), which is called on every request message to make sure it has happened once. It also recurs if a cache request results in a miss.

Andy - would you please confirm that to be the case?

I would think the more logical (and efficient) strategy would be to load a page only on demand. Obviously, its ancestors would have to be loaded as well, but I don’t see that children are needed (unless it is a module). Would you be able to articulate from what stems the requirement that the whole site be recursively parsed and loaded at startup?

Thanks!

I’m still holding out against hope that the product is suitable for a much larger repository. In theory, it ought to be able to handle Wikipedia, I’m thinking…:slight_smile:

FWIW, I added
ini_set(‘memory_limit’,‘10240M’);
to setup.php, 1 GB limit.

It allows the site to load. Every request takes about 45 seconds to complete with debugger attached. :frowning:

Without debugger, through IIS and FastCGI, about 20 seconds and 350 MB per request on a very well-equipped dev machine.

30,000 of the files are small, 2-300 bytes. Another 1200 are 2-3K bytes. The total is at most 15 MB of text. The rest are demo pages from a few themes that are installed, so another 10 MB at most.

25 MB user pages total.

So the memory use is not so much file content as it is Grav overhead.

Another observation. system.cache.enabled was false in above test. With it enabled, under IIS/FastCGI, the first load is 20 seconds, but the second and thereafter is reduced to 9 seconds. Still unacceptable, obviously.

Additional references to the topic at hand:
Large Website “Rebuilding” Improvements
“Large” site support on Grav v2 roadmap (2017): Roadmap for Grav 2
Scaling issues - 2016

The Pages.buildRegularPages(), called on every request, does change detection by file system walk. 30K folders is too much.

Temp Workaround:
Set system.cache.check.method to off or none. Cache will be built only on manual reset, or system file changes. Page reload is down to 2-3 sec (IIS), so other optimizations may speed that up as well. Memory usage is still 350MB per request. Probably loads the whole cache up to find one page. Idle size is 122MB.

Consider a FileSystemWatcher for php. Not an uncommon need or problem. Event callback (need to debounce) and then rebuild cache on background thread.
Search DDG

Cache should only update changed files, not the entire directory tree? Investigate.

Other considerations:
Disable all unnecessary plugins at the top of the containing folder.
Disable twig as well.
AdvancedPageCache - will cache static content. Will it do so by folder/path spec? Has a ‘whitelist’ property that seems to only pertain to files, not folders. Could probably adapt with minor changes.