Spam Filter Configuration: How to Stop Spam Without Blocking Legitimate Email

A spam filter that lets everything through defeats its purpose. A spam filter that blocks too aggressively costs your organization legitimate email — missed job applications, blocked customer inquiries, lost invoices. The goal is precision: reliably stopping spam and phishing while maintaining a false positive rate low enough that users trust the system. This guide covers the mechanisms available, how they interact, and how to calibrate them for your environment.

Understanding the Filtering Stack

Modern spam filtering is not a single check but a pipeline of mechanisms, each adding signal. A message typically passes through:

Connection-level checks — is the sending IP on a blocklist? Does it have a valid PTR record? Is the domain newly registered?
Authentication checks — do SPF, DKIM, and DMARC pass? Is the From: header aligned?
Reputation checks — does the sending IP or domain have a history of spam?
Content analysis — do the message body, subject, and headers match known spam patterns?
Behavioral analysis — does this message pattern match other recently seen spam campaigns?
User feedback signals — have other users reported similar messages?

Each mechanism has false positive and false negative rates. A well-configured system combines them using a scoring approach rather than treating any single check as a pass/fail gate.

Authentication as a Filtering Signal

SPF, DKIM, and DMARC failures are strong negative signals but should rarely be used as hard blocks on their own (especially SPF failure, which is common with forwarded email). Use them as spam score contributors instead.

SpamAssassin scoring example

SpamAssassin, the most widely deployed open-source spam filter, uses rules with point values:

# Common authentication-related rules and their default scores
MISSING_MX_RECORD          0.9   # No MX record for sender domain
SPF_FAIL                   0.9   # SPF hard fail
SPF_SOFTFAIL               0.1   # SPF soft fail
DKIM_ADSP_ALL              1.0   # Domain says all mail should be signed, but isn't
DKIM_ADSP_DISCARD          2.0   # Domain says discard unsigned mail
DMARC_REJECT               3.0   # DMARC policy is reject and message fails

A message that fails SPF gets 0.9 points toward the spam threshold. Combined with other signals (suspicious subject line, mismatched display name, URL in body pointing to newly registered domain), the cumulative score crosses the spam threshold and the message is quarantined.

Configuring DMARC policy interaction

If you operate your own mail server, configure it to respect DMARC p=reject policies. Postfix with the opendmarc milter:

apt-get install opendmarc

/etc/opendmarc.conf:

AuthservID mail.yourcompany.com
TrustedAuthservIDs mail.yourcompany.com
RejectFailures true
IgnoreAuthenticatedClients true
RequiredHeaders true
Syslog true
UMask 022
UserID opendmarc

With RejectFailures true, mail from domains with p=reject that fail DMARC is rejected at the SMTP level rather than delivered to spam. This is the correct behavior for enforcing senders' published policies.

Blocklist Integration

Real-time Blocklists (RBLs) and DNS-based Blocklists (DNSBLs) are databases of known spam-sending IPs and domains, queryable via DNS. A mail server queries the DNSBL with a reversed-octets version of the connecting IP.

Common blocklists to integrate

Spamhaus ZEN (zen.spamhaus.org) — combines SBL, XBL, and PBL; covers spam sources, exploited hosts, and dynamic IP ranges
Spamhaus DBL (dbl.spamhaus.org) — domain blocklist for URLs found in spam messages
Barracuda Reputation Block List (b.barracudacentral.org)
SURBL (multi.surbl.org) — URIs in spam messages

Configuring RBL checks in Postfix

# /etc/postfix/main.cf
smtpd_recipient_restrictions =
  permit_mynetworks,
  permit_sasl_authenticated,
  reject_unauth_destination,
  reject_rbl_client zen.spamhaus.org,
  reject_rhsbl_sender dbl.spamhaus.org,
  permit

Use reject with RBLs carefully. Spamhaus ZEN's Policy Block List (PBL) includes large residential IP ranges that may be used by legitimate senders via VPN. Consider using RBLs as score contributors in SpamAssassin rather than hard rejects in Postfix to reduce false positives.

Greylisting

Greylisting exploits the fact that spam-sending systems typically do not retry failed deliveries, while legitimate mail servers do. When an unknown sender connects for the first time, the mail server issues a temporary rejection (451 or 421). A legitimate MTA retries after a few minutes and is accepted. A spam bot moves on.

Implementation with Postfix and Postgrey

apt-get install postgrey

/etc/postfix/main.cf:

smtpd_recipient_restrictions =
  permit_mynetworks,
  permit_sasl_authenticated,
  reject_unauth_destination,
  check_policy_service inet:127.0.0.1:10023,
  permit

Postgrey by default greyists new sender-IP, sender-address, recipient triples for 5 minutes. Whitelisted senders (configured in /etc/postgrey/whitelist_clients) bypass greylisting.

Greylisting trade-offs

Greylisting adds 5–15 minutes to first-time email delivery. This is acceptable for most email but problematic for time-sensitive communications (password reset links with short expiry, two-factor authentication codes). Whitelist categories that send time-sensitive mail:

# /etc/postgrey/whitelist_clients
.google.com
.amazonses.com
.sendgrid.net
.mailchimp.com

Greylisting is less effective than it was in 2010 because many spam operations now use distributed sending infrastructure that retries correctly. It still catches a meaningful portion of low-sophistication spam.

Content Filtering

Content filters analyze the message body, subject line, HTML structure, URLs, and attachments for spam characteristics. SpamAssassin uses a rule-based approach; commercial filters add machine learning on top.

Key SpamAssassin configuration levers

The spam threshold (default 5.0) determines when a message is classified as spam. Adjust in /etc/spamassassin/local.cf:

# Lower threshold means more aggressive filtering
required_score 4.5

# Enable Bayes classifier (learns from user feedback)
use_bayes 1
bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam 0.1
bayes_auto_learn_threshold_spam 12.0

# Enable network tests (RBL lookups, URL reputation)
skip_rbl_checks 0

Training the Bayes classifier

The Bayes classifier improves significantly with training data. Feed it correctly classified examples:

# Train on spam
sa-learn --spam /path/to/spam/mailbox

# Train on legitimate mail
sa-learn --ham /path/to/ham/mailbox

# Check database status
sa-learn --dump magic

At least 200 spam and 200 ham examples are needed before Bayes classification activates. The more examples, the better the accuracy.

Quarantine Policy Design

Rather than deleting suspected spam, quarantine it. Users can review quarantined messages and recover false positives. Your security team can analyze quarantined messages for threat intelligence.

Quarantine thresholds

A common configuration:

Score 0–4.9: Deliver normally
Score 5.0–7.9: Deliver to Junk/Spam folder
Score 8.0–12.9: Quarantine (user can request release)
Score 13.0+: Delete (high-confidence spam or phishing)

User quarantine reports

Send users a daily digest of their quarantined messages so they can request release of false positives without contacting IT for every case. MailScanner, amavisd-new, and commercial platforms all support quarantine digest emails.

Phishing messages that are definitively identified (by DMARC p=reject failures or malware detection) should not be releasable by users — only by security administrators.

Avoiding False Positives

False positives — blocking legitimate email — are the most common reason organizations loosen spam filtering to the point of ineffectiveness. Address them systematically:

Identify recurring false positive sources: Review what gets caught in spam or quarantine. If a particular partner's domain is consistently flagged, investigate why (broken SPF, no DKIM, suspicious subject patterns) and fix the root cause or whitelist them explicitly.

Allowlisting: Use domain-level allowlisting sparingly for trusted business partners whose mail consistently fails content scoring. Check the allowlisted domain's authentication records first — a partner with no DMARC is a phishing risk if their domain is compromised.

Train users to report false positives: Make it easy for users to flag misclassified messages. Feed those reports back to your Bayes classifier and spam filter configuration.

Monitor delivery metrics: If users from a specific department start complaining about missing emails, investigate the spam filter logs for that time period. Correlation between a filter rule update and user complaints pinpoints the false positive source.

Spam filtering is not set-and-forget. Review quarantine trends monthly and adjust thresholds and rules as spam patterns evolve.