Fast & Security Conscious High-Availability Unix Hosting |
Bayesian filtering micro-howto |
|
[English] [Français] [Norsk] |
This guide assumes that you have already configured spam filtering,
either from the web interface or the shell interface with the
"mail rule" commands.
Registering known spam/hamThis is where you can finally put the old spam you have accumulated to good use. Using sa-learn, you can register the contents of a mailbox or a maildir as spam or non-spam. If the spam corpus is particularly large, this might take a long time. # Register SPAM from a mbox(5) file: $ nice sa-learn --spam --mbox ~/Mail/Mailbox.spam # Register HAM from a mbox(5) file: $ nice sa-learn --ham --mbox ~/Mail/Mailbox.ham # Register SPAM from a maildir(5): $ nice sa-learn --spam --maildir ~/Mail/Maildir.spam # Register HAM from a maildir(5): $ nice sa-learn --ham --maildir ~/Mail/Maildir.ham FeedbackAfter you have registered your initial batch of messages, you will occasionally want to feed sa-learn more recent spam and ham samples. Bayesian filtering requires such feedback in order to be fully effective. For convenience, you will want to integrate the learning process with your mail reader. Some spammers try to evade Bayesian filtering by including lists of random words in the message. When you are registering spam with your Bayesian filter, try to remove randomly generated words from the message using an editor beforehand. Mutt macrosIf you are using mutt, you can add the following to your muttrc file so that specific keys can be used to trigger the learning of the selected message, either as spam or as ham (legitimate e-mail). Using SpamAssassin, you may find it convenient to create a macro to invoke the --rebuild function of sa-learn, which is used to update the Bayesian filtering database. set wait_key=no macro index H "|sa-learn --ham --no-rebuild --single" macro pager H "|sa-learn --ham --no-rebuild --single" macro index S "|sa-learn --spam --no-rebuild --single" macro pager S "|sa-learn --spam --no-rebuild --single" macro index R "|sa-learn --rebuild" macro pager R "|sa-learn --rebuild" With the SpamAssassin method, sa-learn is a Perl script and may take a while to execute so it may be more convenient to use mutt's s key to append the message to files (say ~/s for spam and ~/h for ham), and then feed the contents of those files to sa-learn at some later time. |