Using Bayesian spam filtering

The technique of Bayesian filtering is described by this article. Our e-mail system integrates SpamAssassin, which uses Bayesian filtering in conjunction with its regular battery of tests. Note: If your e-mail is not configured for spam filtering already, please refer to: Using the Spam Filter first.

Automatic learning

By default, SpamAssassin will automatically train the Bayes filter based on the results of its battery of tests. If you want, you can disable this behavior by setting bayes_auto_learn=0 in ~/.spamassassin/user_prefs.

Training the filter with the Spam/ folder

Whenever a mail user moves a message to the Spam/ folder, the server automatically learns from the message (as being spam). Moving a message from the Spam/ folder to Inbox will have the reverse effect. The precise action taken by the server when moving messages between folders is described by the following matrix:

_Destination^Source	`Spam/`	`Trash/`	`Quarantine/`	Inbox / other
`Spam/`			Mark as spam	Mark as spam
`Trash/`			(Forbidden)
`Quarantine/`	(Forbidden)	(Forbidden)		(Forbidden)
Inbox / other	Mark as not spam		Mark as not spam

Some IMAP client applications use non-standard names for the Spam/ and Trash/ folders. The server tries to deal with this by also recognizing common patterns such as Junk/, Deleted Items/, case variations and common translations such as Courrier Indésirable/ and Éléments supprimés/.

Training the filter manually

You can also train the Bayes filter manually with the sa-learn command on the server.

To run these commands, make sure you are logged in to the correct mail account, on the active mail server. All of your mail accounts are listed in the Control Panel (under Mail / Mailbox Accounts). If the account name is yourname and your mail server is mail123.csoft.net, you would log into:

  $ ssh yourname@mail123.csoft.net

The sa-learn utility is able to read individual messages from files, or entire folders. Use the --spam argument to indicate confirmed spam:

  $ sa-learn --spam ~/Mail/Maildir/.Spam

Use the --ham argument to indicate the input is confirmed non-spam. Assuming your entire Inbox is free of spam, you can use:

  $ sa-learn --ham ~/Mail/Maildir/cur

To display information about your active Bayes database, use the --dump magic argument:

  $ sa-learn --dump magic

For mutt users

If you are using the mutt mail user agent, you can add the following to your muttrc file so that specific keys can be used to trigger the learning of the selected message, either as spam or as ham (legitimate e-mail).

  set wait_key=no

  # H: Register message as non-spam
  macro index H "|sa-learn --ham --no-rebuild --single"
  macro pager H "|sa-learn --ham --no-rebuild --single"

  # S: Register the message as spam
  macro index S "|sa-learn --spam --no-rebuild --single"
  macro pager S "|sa-learn --spam --no-rebuild --single"

  # R: Rebuild the Bayes database (call last)
  macro index R "|sa-learn --rebuild"
  macro pager R "|sa-learn --rebuild"

Links