Skip to content

Building an RSS news aggregator with Drupal

An overview of how I created AnimalRights.fyi, a custom news aggregator.

AnimalRights.fyi is a news aggregator I built to pull together RSS feeds from animal rights and vegan news sources into a single location. The goal is to make this information easily accessible while linking to the people and organisations working to reduce animal suffering.

This blog post write-up is both for my own future reference and for anyone else who might find it helpful.

AnimalRights.fyi in action, showing how the feed can be filtered by headline, and how the user can react to items with the emoji icons.

Introduction permalink

The core functionality is based on a View and the Aggregator module. The site also uses the Rate and Voting API modules to enable users to ‘react’ to any news item by tapping on one of the emojis. The theme is a subtheme of Olivero, which ships with Drupal core. I quickly set up Drupal using SiteGround’s app installer. (Handily, the install comes with Composer, Drush and git out of the box.)

Custom modules permalink

As well as contributed modules Aggregator and Rate, there are three custom modules:

‘Cookie Voter’ permalink

Cookie Voter alters the Rate module so that it uses cookies instead of IP addresses to track anonymous emoji reactions. Using IP addresses can cause users on shared networks to see others’ reactions appear as their own, which is confusing. BuzzFeed takes a similar approach, storing user reactions in the browser’s localStorage. On the other hand, the Bear blogging platform appears to use IP addresses to record upvotes. In Bear’s case that’s a sensible approach as votes affect a list of trending posts. At AnimalRights.fyi, emoji reactions simply offer users an informal way to engage with news stories.

‘Custom Headline Filter’ permalink

Custom Headline Filter provides a Views filter for removing duplicate (or overly similar) headlines. Such headlines can appear due to malformation of the incoming RSS feed, crossposting between websites, or when multiple news outlets report on the same story. There is a similarity threshold which can be adjusted to taste in the Views UI. The similarity is calculated by PHP’s similar_text function.

‘Custom Twig Extensions’ permalink

I needed to use PHP’s preg_replace function, and wanted to avoid installing the Twig Extensions module; so I created Custom Twig Extensions. (I try to install as few modules as possible to keep the complexity at a minimum and make site maintenance easier.)

Custom modules summary permalink

All three modules were basically written by Claude (3.5 Sonnet), though there was a fair amount of back and forth between it and myself. In the case of Cookie Voter, for example, I asked it to refactor this Drupal 7 code for Drupal 8+, but it took much prompting to achieve the desired result. ‘We’ eventually landed on a solution after I fed the AI some relevant code from the Rate module codebase. This is a useful tip I will bear in mind when problem solving with a chatbot in future: don’t assume it already ‘knows’ a given codebase. (I had a conversation with ChatGPT about why supplying codebase excerpts for more obscure coding problems might be necessary.)

The View permalink

Here’s a screenshot of the View:

A screenshot of the View which creates the listing of aggregated news items.
A screenshot of the View which creates the listing of aggregated news items. The output is formed in the Fields section in the left-most column.

Fields section permalink

The Fields section forms the HTML output of the View, with Twig as the templating language. Here’s the Fields section aggregated into a single template representation:

<div class="news-item {{ title_1 }} fade-in-quick">
<h3 class="headline{% if field_podcast == '1' %} icon-podcast{% endif %}" iid="{{ iid }}"><a href="{{ link }}" target="_blank">{{ title }}</a></h3>

{# Get first paragraph of description and remove HTML tags #}
{% set first_paragraph = description|split('</p>')[0]|trim %}
{% set first_paragraph = first_paragraph starts with '<p>' ? first_paragraph[3:] : first_paragraph %}
{% set first_paragraph = first_paragraph|striptags|trim %}

{# Handle truncation #}
{% set last_char = first_paragraph|last %}
{% set last_three = first_paragraph|slice(-3) %}
{% set last_nine = first_paragraph|slice(-9) %}

{%
if last_char not in ['.', '!', '?', '', ':'] and
last_three != '[…]' and
first_paragraph|slice(-2) != '."' and
first_paragraph|slice(-6) != '&nbsp;' and
last_nine != ' ... more'
%}

{% set first_paragraph = first_paragraph ~ '.' %}
{% elseif last_char == '' %}
{% set first_paragraph = first_paragraph|custom_replace('/(?<!\s)…$/u', ' […]') %}
{% elseif last_nine == ' ... more' %}
{% set first_paragraph = first_paragraph|custom_replace('/ \.\.\. more$/', ' […]') %}
{% elseif last_three == '...' %}
{% set first_paragraph = first_paragraph|custom_replace('/(?<!\s)\.{3}$/u', ' […]') %}
{% elseif last_char == ':' %}
{% set first_paragraph = first_paragraph|custom_replace('/:$/', '.') %}
{% endif %}

{# Hide descriptions that include these strings #}
{% set excluded_patterns = [
'©',
'Image courtesy of',
'Image supplied by',
'Image credit',
'The post',
'Image:',
'No abstract',
'If you enjoyed this episode',
'Published on'
] %}

{% set is_excluded = excluded_patterns|filter(pattern => pattern in first_paragraph)|length > 0 %}

{% if first_paragraph|length > 5 and not is_excluded %}
<div class="views-field views-field-description">
{{ first_paragraph|custom_replace('/&nbsp;/', '')|custom_replace('/&amp;/', '&') }}
{% if field_include_feed_description == '1' %}
{{ description_1 }}
{% endif %}
</div>
{% endif %}

<div class="meta">via {{ field_website }}
{% if field_donate %}
<span>
<span class="time">{{ timestamp }}</span><span> {{ field_donate }}</span>
</span>
{% else %}
<span class="time">{{ timestamp }}</span>
{% endif %}
</div>
</div>

<div class="comment fade-in-quick">{{ field_comment }}</div>

Most of the logic here is a bunch of heuristics which tidy up the HTML contained within the RSS feeds, eg standardising the ellipsis style for truncated descriptions, and removing descriptions that don’t provide any value to the reader.

You’ll see in the screenshot that a number of fields are set to ‘hidden’. Hiding a field makes its value available to output in the ‘Rewrite results’ section of subsequent fields. This allows you to combine or perform logic on two or more fields at once. The Field ‘Aggregator feed item: Title’, for instance, uses both the ‘Link’ and ‘Podcast?’ fields. The following markup is from the ‘Title’ field configuration under ‘Rewrite results’:

<div class="news-item {{ title_1 }} fade-in-quick">
<h3 class="headline{% if field_podcast == '1' %} icon-podcast{% endif%}" iid="{{ iid }}"><a href="{{ link }}" target="_blank">{{ title }}</a></h3>

Here’s a screenshot of the UI:

Screenshot of the ‘Aggregator feed item: Title’ field UI.
Hidden fields ‘Link’ and ‘Podcast?’ (see inset, taken from the main Views screenshot) are subsequently available in the ‘Aggregator feed item: Title’ field UI.

Best practice permalink

It’s probably better practice to create a Twig template file override instead of scattering the template across the View’s GUI as I’ve done here. Using the GUI is great for quickly prototyping Views, but you may later want to port it to a Twig template file so you can see the full template at a glance. (I may do this for my View as part of a project refinement exercise.)

Having all the code in a single template file also makes it easy to track changes. That said, it’s still possible to track changes with the GUI approach by exporting the configuration (drush cex) after making changes then committing the output to git.

Filter the View output permalink

The ‘Filter criteria’ section removes certain items entirely, such as sponsored posts and recipes. ‘Filter headlines (exposed)’ provides a text field by the which the user can search to filter news items by their headline.

Relationships permalink

The View is set up to list Aggregator feed items specifically; not the actual feeds. The ‘Aggregator feed’ relationship (under Advanced in the top-right of the Views screenshot) allows us to include custom fields from Aggregator feeds themselves. These include ‘Link’ fields for the feed’s website and a page where the user can donate to or otherwise support the website. Here’s a screenshot of /admin/config/services/aggregator/fields:

Screenshot showing custom fields added to Aggregator feeds.
There are a few custom fields added to Aggregator feeds. The View is set to list Aggregator items, as opposed to feeds, but we can output fields from an item’s associated feed by adding a Relationship in the View’s Advanced section.

The ‘field_aggregator_item_rss_item_metadata’ Relationship links the View to a custom content type called ‘RSS Item Metadata’, which allows us to attach additional metadata to individual feed items. An ‘Entity reference’ field allows us to search for the news item we want. We can also add a comment beneath a news item or pin it to the right-hand column. Here’s the ‘Manage fields’ UI:

Screenshot of the ‘RSS Item Metadata’ content type.
The ‘RSS Item Metadata’ content type allows us to choose a news item and comment on or ‘pin’ it.

Other settings permalink

‘Use AJAX’ (under ‘Other’) is set to ‘Yes’. This allows pagination and filtering (‘Filter by headline’) without reloading the entire page.

Conclusion permalink

This post is an overview rather than a step-by-step tutorial. I may explore specific aspects of the website’s functionality in more depth in future posts.

Concerning Drupal as a platform, I wouldn’t necessarily recommend it for side projects like this. Drupal is complex, which is fine because it’s powerful; but managing the system (mainly running regular updates and managing configuration) can be a massive ball-ache. I only chose it for AnimalRights.fyi because I already build Drupal sites professionally. (And LLMs like Claude and ChatGPT make working with Drupal and similarly complex platforms much easier.)