RSS filter

RSS feeds recommendation system based on user read articles. Replaces the feed URLs with the backend URL and uses the backend to filter out unwanted items and track user read articles. Uses LLM embeddings and machine learning to recommend similar articles.

This is a simple RSS filter that filters out unwanted items from an RSS feed. It is written in Python and uses the feedparser library to parse the feed.

It works by tracking the users read articles, computing their embeddings, clusyering them, and then recommending similar articles from the feed. It also includes random articles from the feed to allow for discovery of new topics. This starts working only after a user has read a few articles (10 by default).

Embedding models allow for a new era of recommendation systems, where a large user base is not required, since recommendations are based on the content of the articles, not on other users behavior.

Self-hosting

You can self-host this project by running the following command:

cp .env.example .env
docker-compose -f docker-compose.yml up

If you don't have or want to use the GPU, first run:

sed -i 's/^.*devices:.*$/#&/' docker-compose.yaml

Test it with:

curl -X 'GET' \
  'http://localhost/api/v1/feed/1/https%3A%2F%2Fnews.ycombinator.com%2Frss' \
  -H 'accept: application/json'

To use the self-hosted frontend, you should change apiBaseUrl in frontend/static/app.js to match the backend URL.

Security

SSRF Protection

RSS Filter includes built-in protection against Server-Side Request Forgery (SSRF) attacks. When fetching feeds, the application automatically blocks requests to:

Private IP ranges (192.168.x.x, 10.x.x.x, 172.16.x.x-172.31.x.x)
Localhost (127.x.x.x, ::1)
Link-local addresses (169.254.x.x)
AWS metadata service (169.254.169.254)

This protection works automatically for basic setups - no configuration needed. The application validates DNS resolution for all requests, including redirects.

Enhanced Security with Proxy (Optional)

For additional security in production environments, you can route all feed requests through a proxy that only allows external hosts:

# docker-compose.yaml
services:
  backend:
    environment:
      FEED_PROXY: http://gluetun:8888  # Your proxy service

Using a proxy provides defense-in-depth by enforcing network-level isolation.

Architecture

The application consists of several services:

backend: FastAPI application serving the API
frontend: Static web frontend
redis: Message queue for background jobs
rq-worker: Background workers for feed fetching and embeddings
rq-worker-gpu: GPU-enabled worker for computing embeddings
scheduler: Handles all periodic tasks (replaces external cron jobs)
proxy: Traefik reverse proxy

Scheduled Tasks

The scheduler service handles all periodic tasks automatically:

Task	Schedule	Description
`fetch_all_feeds`	Hourly	Fetches all feeds for active users
`run_full_maintenance`	Daily 4am UTC	Cleanup old articles, vacuum database
`retry_disabled_feeds`	Weekly Sunday 3am UTC	Retry feeds that were disabled due to errors

No external cron jobs are required.

CLI Commands

The backend includes a CLI for manual operations:

# Inside the backend container
python -m app.cli --help

# Available commands:
python -m app.cli fetch-feeds      # Manually trigger feed fetching
python -m app.cli retry-feeds      # Re-enable and retry disabled feeds
python -m app.cli maintenance      # Run full maintenance cycle
python -m app.cli stats            # Show database statistics
python -m app.cli freeze-users     # Freeze dormant users
python -m app.cli unfreeze USER_ID # Unfreeze a specific user
python -m app.cli clean-articles   # Delete old unread articles
python -m app.cli clean-embeddings # Remove old embeddings
python -m app.cli vacuum           # Vacuum and analyze database

Development

Backend

Dependencies

To install the required libraries, run the following command in the backend or frontend:

pip install -r requirements.txt

Running the backend

cd backend
python -m uvicorn app.main:app --reload --log-level debug --port 8000

Contributing

There are some hooks in .pre-commit-config.yaml to ensure:

pip-compile is up-to-date with added dependencies
code is well formatted and linted with ruff and black.

You can install these hooks with pre-commit install and run them on demand by pre-commit run --all-files.

Contact

If you have any questions, feel free to contact me at m0wer at autistici dot org.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.git.backup		.git.backup
.github/workflows		.github/workflows
backend		backend
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RSS filter

Self-hosting

Security

SSRF Protection

Enhanced Security with Proxy (Optional)

Architecture

Scheduled Tasks

CLI Commands

Development

Backend

Dependencies

Running the backend

Contributing

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

m0wer/rssfilter

Folders and files

Latest commit

History

Repository files navigation

RSS filter

Self-hosting

Security

SSRF Protection

Enhanced Security with Proxy (Optional)

Architecture

Scheduled Tasks

CLI Commands

Development

Backend

Dependencies

Running the backend

Contributing

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Packages