Twigfeeds RSS Feed Labelling/Categorisation

Took me some time to get back to this; beta.7 is a quick fix that addresses the Admin-issues. It is because of changes or regression-issues with the selectize-field. I’ve been that road before, and the same problems seem to have returned. Note that contrary to previously, both categories and tags are pluralized and defined as Lists, thus cannot be a string-value and won’t get picked up by category or tag.

user/config/plugins/twigfeeds.yaml bears no relation to the cache, and only defines the settings for the parser and is interpreted by Admin for the GUI. I can see the cache-issue however, it is because the config is not carried over into the result. I’ll look for a fix shortly. EDIT: Should be fixed with beta.8.

Thanks for this, and your continued great effort to evolve the plugin. Please dont feel harrassed by me. Soon I will have less time also :wink:

Update on tests/experience with beta.6:

  • Im testing about 25 feeds. I’ll limit it at 30 I think.
  • Im running categories, tags, a merged full feed and a ‘snapshot’ random 3 entries of the full feed, plus a source list.
  • All tags & categories behave exactly as expected. Using a dynamically rendered page for tag and one for category to call items works brilliantly.
  • Full feed works great, overall as expected. Pagination works great.
  • Anomaly with caching/refreshing Blusesky feeds (see below)
  • Anomaly while using randomize when calling 3 random items from the merged feed as a small preview of the full feed. (see below)

Bluesky cache_time and lastModified

lastModified is calling the most recent refresh of the BS rss by twigfeeds, NOT the original pubDate of the BS item, so it adds the newest refresh date as if its the pubDate - to all the items Im pulling from BS (3 items each for 2 feeds). This only happens with BS feeds, all other feeds retain the original item pubDate and populate the merged feed etc as expected. But when BS feeds are refreshed, all 6 items have the same or very similar new lastModified date and go to the top of the merged feed list and stay there until another feed is refreshed/updated later. They very quickly return to the top if they have short or null cache_time.
I tested this by adding cache_time: 3600 to every feed except the two Bluseky feeds, which have cache_time: 21600. This prevents the BS feeds continuously occupying the top 6 items of the merged feed list. Checking the BS original pubDate in the raw RSS confirms what Im thinking, that lastModified in the case of BS is the time twigfeeds refreshes the BS feed update, and the original pubDate of an item is lost in the twigfeeds cached data. I hope this is understandable.

Odd randomize behaviour

Im doing this to create a small preview of random items from the merged feed. When calling the retrieveTitle field, an odd behaviour occurs - the feed name often just does not show up in the template, even though the item always shows. I cannot work out why this is happening. It affects any feed source. Ive tried numerous configurations of how to call the random 3 entries.

  • For retrievedTitle as
    {% set item = item|merge({ 'retrievedTitle': feed.config.name }) %} or {% set item = item|merge({ 'retrievedTitle': name }) %} both work, but only randomly.

  • For the item call:
    {% for index, item in feed_items|sort_by_key('sortDate')|randomize|slice(0, 3) %}
    Or
    {% for item in feed_items|ksort('retrievedTitle')|randomize|slice(0, 3) %}
    Or
    {% for item in feed_items|sort('sortDate')|randomize|slice(0, 3) %}
    etc

  • The item is called as usual:

   {#  if it has no title, use the description #}
        {% if item.title != true %}
             <li class="">
                <em style="font-weight: 600">{{ item.retrievedTitle }}</em> 
                <a class="feed-desc-url" href="{{ item.link }}">{{ item.content|safe_truncate_html(8)|striptags|raw}}</a>               
                <small>{{ item.lastModified }}</small>
             </li>
        {% else %}         
            {#  otherwise, show the title field #}
             <li class="">
                <em style="font-weight: 600">{{ item.retrievedTitle }}</em> 
                <a class="feed-desc-url" href="{{ item.link }}">{{ item.title }}</a>           
                <small>{{ item.lastModified }}</small>
             </li>
        {% endif %}    

I cannot explain why retrievedTitle only shows up randomly. Any ideas appreciated on this. Maybe I need to do a set random thing but I dont know how.

Ill keep updating the shared Obsidian note with other info as I go along.

BlueSky’s lack of conformance and server-setup will cause this, and as you’ve discovered the solution with aligning TwigFeed’s cache_time to make it behave is insufficient. The offending line is in Parser.php, used as a fallback for this lack of conformance. The parser-library’s getLastModified should handle pubDate, but for the time being this looks like an edge-case with BlueSky that would need further debugging.

Thanks for the sort_by_key reminder, and updating the Obsidian-note, I’d forgotten if Twig can do datetime-parsing and comparison by itself. Seemingly not through the native |sort((a, b) => a.lastModified - b.lastModified).

retrievedTitle is not a standardized property, but seemingly based on the pagination-example. My best guess would be misalignment between TwigFeed’s and Twig’s caching, but the use of the underlying properties feed.config.name and name shouldn’t be affected. A trick for debugging is to add in

echo '<script>window.twig_feeds = ' . json_encode($feed_items) . ';</script>';

before line 372 in twigfeeds.php, to view the actual data passed to Twig on runtime. That is taken directly from the raw data, cached or not. Beyond that, all cached data is in cache://twigfeeds or user://data/twigfeeds, and you’d have to compare if there are any changes to the files that manifest.json refers to in order to discover unexpected variations between processing.

For posterity, the following is an example of flipping the feed-data to group and sort it by taxonomy. Grouping depends on a bit of pre-arranging of metadata:

{# Iterate and find unique tag-values #}
{% set twig_feeds_tags = [] %}
{% for name, feed in twig_feeds %}
  {% for value in feed.config['tags'] %}
    {% if value not in twig_feeds_tags %}
      {% set twig_feeds_tags = twig_feeds_tags|merge([value]) %}
    {% endif %}
  {% endfor %}
{% endfor %}
{# Iterate and find unique category-values #}
{% set twig_feeds_categories = [] %}
{% for name, feed in twig_feeds %}
  {% for value in feed.config['categories'] %}
    {% if value not in twig_feeds_categories %}
      {% set twig_feeds_categories = twig_feeds_categories|merge([value]) %}
    {% endif %}
  {% endfor %}
{% endfor %}

So that you could efficiently regroup, sort, and render them:

{% set twig_feeds_tags_items = [] %}
{% for value in twig_feeds_tags %}
  {% set twig_feeds_filtered = twig_feeds|filter(v => value in v.config.tags) %}
  {% set twig_feeds_filtered_items = [] %}
  {% for name, feed in twig_feeds_filtered %}
    {% set twig_feeds_filtered_items = twig_feeds_filtered_items|merge(feed.items) %}
  {% endfor %}
  <h4>{{ value }} ({{ print_r(twig_feeds_filtered_items|count) }})</h4>
  {% for item in twig_feeds_filtered_items|sort_by_key('lastModified') %}
    <time>{{ item.lastModified }}</time>
    <small>
      <a href="{{ item.link }}">{{ item.title|default(item.link) }}</a>
    </small>
    <br />
  {% endfor %}
{% endfor %}

(forgot to mention, I’ll test beta.7 asap over next few days.)

The offending line is in Parser.php … parser-library’s getLastModified should handle pubDate , but for the time being this looks like an edge-case with BlueSky

Yes definitely the fault of Bluesky, I can see how its behaving. No matter if there are updated items or not, all Bluesky sources appear again at the top whenever Twigfeeds checks the server at cache_time (atm every 6hrs). Every other feed behaves as it should re pubDate and refreshing so I keep that at 30minutes.

updating the Obsidian-note

I’ll tidy this up so its a more logical and usable, for you to see how users encounter and deal with things, and documenting the issues I come across more clearly.

retrievedTitle is not a standardized property

Yes I know, but I was using that for convenience to call the name or feed.config.name, as per the merged feed code pattern/snippet. I will test the code line in twigfeeds.php and see what I can find out. Its very odd. This does not affect item randomization at all.

Thanks also for the more expert code on the tag and category sorting and parsing. I do have a simpler version of doing this but will test your advanced way.

btw I just sorted the source list alphabetically, which is a useful thing to also have in this more advanced and dynamic way of using Twigfeeds with a lot of feeds, it’s very simple - {% set twig_feeds = twig_feeds|sort_by_key('name') %} , then calling as normal.
It looks like this currently

A note on overriding a plugin’s blueprints: I have replicated the issue in Extending a blueprint causes fatal error · Issue #2458 · getgrav/grav-plugin-admin · GitHub.

quick update. Im a bit slow atm on this as Im sunk in other time sensitive work (mid Sept deadline). I havent tested v7 yet, sorry. But Im testing how much load can be done on the plugin - Im running 36 feeds (3 posts each) and can say no real issues at all aside from slightly slow loading of the website PWA when Im on a phone. But once it opens after a few seconds wait, everything works very fast. On desktop load time is not noticeable. Nearly every feed Ive used works properly. Obv some feeds update less than others but most are fairly active.

Ill test beta7 asap.

So I just broke the site - error in twigfeeds "Call to a member function getFeed() on array", Error user/plugins/twigfeeds/classes/Parser.php:124

This breaks the front end completely, but admin is intact. I was doing some work to get a dynamic page for single sources using uri.param and it was partially working but I couldnt work out why only some of the feeds worked. So, I cleared cache, but instead of using php8.1-cli, I used php8.3-cli instead (Ionos requires this method of using cli commands and the site configuration/info lists php8.3.24). It was at this exact moment it broke so I dont think its anything to do with my templates or with user/config/plugins/twigfeeds.yaml as I removed that to test. I have since updated everything on the site (Grav, the standard plugins and moved to twigfeeds v5beta7, but it makes no difference.

I also tried testing whether its a single feed in the twigfeeds list that broke it (this is possible) but as said, even with no feeds at all, its still broken.

Ill carry on trying to see how to get it back and meanwhile move to beta8.

UPDATES
EDIT: beta8 has cured the problem and accepts my full twigfeeds list. I do think the php8.3 version cli command to clear all caches (inc twigfeeds) did something to cause the error/break, as all was working fine prior to that. (Im still puzzled about my single feed page not working for some feeds though!!)

EDIT2: The error has re-occurred since, but then disappears and the front end works again. Im testing on phones and desktop. It may be to do with twigfeeds overloading (and timing out) wen it refreshes feeds because there are too many. Atm Im testing 36. But it has not happened before yesterday.

The plugin itself cannot overload, even running without limits on an endless amount of feeds, but PHP will if time, processing, memory or space is exhausted. The error Call to a member function getFeed() on array, Error user/plugins/twigfeeds/classes/Parser.php:124 is rather a result of an underlying error that was logged but its empty result not handled in the static query-method. These errors all arise from your site speaking to the target server, and any errors will be logged in PHP’s error_log rather than Grav’s or TwigFeeds’. For the next beta-version I’ll resolve/improve both of these, but you may want to examine that log for any specific feed giving a bad result.

Thanks for the info about twigfeeds ‘load’, and php in my server. So, it sounds like a php memory timeout. Ill look into it (php error logs are difficult to locate on Ionos). Maybe I should run a local php.ini with a large memory limit? That may help. I think depending on each feed and its server this kind of problem can happen at any time.

FYI This is currently what I have in user/config/plugins/twigfeeds.yaml settings. Let me know if anything should be changed here.

enabled: true
cache: true
static_cache: false
debug: false
log_file: twigfeeds.log
cache_time: 900
pass_headers: true
silence_security: false
request_options:
  allow_redefaults: true
  connect_timeout: 30
  timeout: 30
  http_errors: false

The error has not re-occurred since this morning on the phone. All feeds are working and being updated. The Admin for twigfeeds is now working lovely - such great work!

btw I sorted the issue with the single source template calls, so that all feeds now work. But that template was not the cause of the error.

Ive updated the Obsidian shared note for my own benefit but you may find stuff in there of interest.

You should test with a configuration locally that matches the one on your production-server, to ensure stability is as expected. Enabling Error Logs on IONOS would be worthwhile to help you locate any errors that occur.

What will have more of an impact on performance is spreading the caching out when there are many feeds and/or if they contain a lot of data. A large data-load in itself doesn’t take much to retrieve, parse, and store, but connecting to multiple remote servers in sequence takes time when they are slow to respond.

That is, align the cache_time with the frequency of the sources’ updates, to avoid querying them all at once, and pass_headers of course. The default request_options connect_timeout and timeout should be sufficient to avoid waiting unecessarily for servers who do not respond in a timely fashion. I would perhaps prefer the static_cache enabled to keep the feeds-data separate from Grav’s cache, making it easier to clear them independently.

Further, you could set up a cronjob to run the CLI to cache routinely, such that the user is not as exposed to re-caches. It will respect your plugin-configuration, and so running it frequently will not unecessarily process anything that would not run otherwise by a visit to the site.

The errors are more plainly apparent in beta.9, and beta.10 adds a more extensive way to test feeds using PHPUnit, as well as sort out some basic normalization for the direct-mode. beta.11 goes a few steps further to add redundancies to handle errors that happen upstream, and make the CLI easier to use.

Hey. Apologies for not being around. V busy on work stuff.

I need to re-read your last couple of responses bc there’s a lot of info there.

I’ve not had time yet to move to beta10, but have been using the reader a lot. I’m now testing much longer cache times for most sources as I’m trying to get the initial site load to be quicker. Once it’s loaded it’s very fast but can be 10+ seconds to load. So I’m trying 2hrs, 3hrs, 6hrs, 12 hrs for different feeds. I’ve enabled the static cache and will look into the cron setup as run crons for another site.

Occasionally I still come across a feed that will break the reader (the array error). Eg this work related feed 3CL Foundation. Very odd as it looks ok. Not a problem to not have but thought you might be curious.

I’ll update here when I’ve moved to beta10.

The 3CL Foundation feed fails with the library, because it cannot parse the feed. It does, however, work in direct mode as far as I can tell.

Hi. Sorry Im not around much atm.

Im running it at beta 9 (as said) but will update to v10 today and then v11 after a few days.

In beta 9, it wont run without grav cache: true, it crashes the front end. It gives this error
"Call to protected method FeedIo\Adapter\Guzzle\Client::request() from scope Grav\Plugin\TwigFeedsPlugin\API\Parser"

It makes no difference whether twigfeeds static cache is enabled or not. So currently Im running with grav cache enabled and testing if twigfeeds static cache makes any difference to speed - it does not appear to affect this. It can affect this. Eg just now, the root page feeds loaded but I could not navigate to any feed/source/category, or any other section of the site. I tried logging in from a phone and that was totally unresponsive. I cleared grav cache again and enabled static cache and everything worked. I dont understand this.

THIS IS CLARIFIED IN THE OBSIDIAN NOTE and also affects beta 11

Also, if I remember correctly, clearing grav cache via cli usually gives a resulting line that twigfeeds cache is cleared. But this is not now there. Maybe Im remembering incorrectly. (Possible, Im doing a lot of other things!).

I have had no other problems with any other feeds. All load as expected.

Re 3CL (or other error feeds) Ill check direct mode more.

Ill update again once Ive tested a bit in v10 and v11 unless odd things happen.