Published: October 01, 2024 by Lucas Rolff
How PerfGrid Shields Your Inbox: Inside Our Rspamd-Powered Spam Filtering Solution
Spam and the impact it has on customers
At PerfGrid, we host thousands of email accounts, many for businesses. Business email counts mean a higher risk of phishing, loss of productivity, and cluttered inboxes. We aim to reduce the number of spam messages that reach customer inboxes. Spam isn't just annoying; it often carries phishing, scams, or even viruses.
While filtering spam, we must ensure legitimate emails are not missed. We don't want essential messages ending up in the spam folder. Achieving this balance is tricky because spam has become more sophisticated. Sometimes, spam emails look better than genuine ones. Advanced filtering techniques are crucial, helping us minimize false positives while catching more spam.
Our path to Rspamd
We started with cPanel, which uses SpamAssassin for spam filtering. SpamAssassin catches obvious spam but has limitations, and extending its features isn't straightforward within cPanel.
Years ago, we switched to SpamExperts®, a commercial solution with much better spam detection than SpamAssassin. However, it was costly, making it impractical for large-scale deployment.
Later, SpamExperts was acquired, and its quality declined, while prices rose. So, we explored other options. We tried Proxmox Mail Gateway (PMG), which also uses SpamAssassin. While it improved spam detection, it came with more false positives and higher administrative overhead.
Eventually, we decided to try Rspamd. It's a high-performance spam filter, capable of detecting spam campaigns quickly through its real-time fuzzy system and extensive RBL lists. Rspamd also allows for easy extension using Lua.
After thorough testing, we replaced PMG with a combination of Rspamd and Postfix. We've since extended Rspamd with custom Lua scripts and additional spam signatures to better target the emails we receive.
How we implemented Rspamd into our infrastructure
Whether our customers use cPanel or our in-house control panel for Grid Hosting, emails get stored on backend servers. For cPanel, this is Exim; for Grid Hosting, we use Postfix.
Initially, we used Rspamd only with our cPanel offerings. Rspamd was set up separately and relayed filtered emails to their destination. Being external was beneficial because it kept spam filtering outside of cPanel and allowed us to use Postfix as an intermediate MTA (Mail Transfer Agent). Postfix, with its Postscreen feature, handles connection floods better than Exim.
As we developed our in-house control panel, we streamlined everything. All emails, regardless of destination, pass through the same Rspamd system. Being centralized reduces costs and improves filtering accuracy by working with a larger dataset.
Our infrastructure setup allows for outages by remaining fully operational even if a single server dies, whether an Rspamd server, KV server, or DNS server. We run two independent Rspamd/Postfix servers in different data centers and networks.
Both servers communicate with external services like our hosting-panel.net API, Rspamd fuzzy storage, RBLs, and DNSBLs. They connect to a Valkey Sentinel cluster across three data centers and networks. This setup allows consistent filtering, regardless of which server processes the email.
Recently, we added a local Large Language Model (LLM) compatible with OpenAI's API. This AI helps us analyze emails we're uncertain about. While it doesn't play a big role yet, it assists the filtering engine in making better decisions.
Security and privacy
Security is a priority in our Rspamd setup. From when an email arrives at our filtering servers to when it reaches the backend, all communication happens over an encrypted network using a self-hosted version of Tailscale. Using Tailscale private networking ensures encrypted data transmission within our system. Emails forwarded to the backend are sent over the internet using at least TLSv1.2, ensuring encryption.
We take privacy seriously. Raw emails never get sent to third parties. While DNS queries are performed (for example, checking spam domains via DNS RBLs), only domains —not full URLs— are shared.
Rspamd uses "maps" to store known spam domains and phrases, downloaded from public lists and queried locally. Fuzzy storage works by hashing parts of the email content; only those hashes are shared externally, keeping the content private. It's worth noting that hashes are undirectional, meaning you cannot restore the original text from the hash.
Looking at the future
We continue improving our spam filtering system, boosting spam detection while minimizing false positives. There's a lot in development, and as spam tactics evolve, so will our filtering capabilities.
In our next blog post, we'll discuss how we handle outbound spam filtering. While we still use Rspamd, we've developed custom detection rules to prevent spam from leaving our systems. More on that soon!