Bayesian training policy (crossposted from SpamAssassin ML)

Julian Field mailscanner at
Mon May 5 16:22:05 IST 2003

At 16:09 05/05/2003, you wrote:
>We recently setup MailScanner at our email gateway with a SA required
>score of 9 (just to avoid most false positives). After a couple of weeks
>of tests, we catched about 80% of spam with just one false positive (a
>mailing list with TONS of ads, we just whitelisted it). Trying to
>improve the detection ratio, we used sa_learn with about 2,000 messages
>of spam and 3,000 messages of ham (manually checked) from the last 3
>months and we now catch something like 90% of spam with no false
>Now the question: we'd like to setup a Bayesian filter learning policy
>that makes sense. What are your suggestions?

What do you mean by a "learning policy that makes sense"?
SpamAssassin will auto-learn on very high and very low scoring mail anyway,
so mostly you can just leave it to get on with it.
Other than that I use a couple of "spam" and "notspam" addresses, whose
mailboxes are piped into sa-learn every hour to help the Bayes code when it
got it wrong.
Julian Field
Professional Support Services at
MailScanner thanks transtec Computers for their support

More information about the MailScanner mailing list