Bayesian training policy (crossposted from SpamAssassin ML)

Mon May 5 16:22:05 IST 2003

At 16:09 05/05/2003, you wrote:
>We recently setup MailScanner at our email gateway with a SA required
>score of 9 (just to avoid most false positives). After a couple of weeks
>of tests, we catched about 80% of spam with just one false positive (a
>mailing list with TONS of ads, we just whitelisted it). Trying to
>improve the detection ratio, we used sa_learn with about 2,000 messages
>of spam and 3,000 messages of ham (manually checked) from the last 3
>months and we now catch something like 90% of spam with no false
>positives.
>
>Now the question: we'd like to setup a Bayesian filter learning policy
>that makes sense. What are your suggestions?

What do you mean by a "learning policy that makes sense"?
SpamAssassin will auto-learn on very high and very low scoring mail anyway,
so mostly you can just leave it to get on with it.
Other than that I use a couple of "spam" and "notspam" addresses, whose
mailboxes are piped into sa-learn every hour to help the Bayes code when it
got it wrong.
--
Julian Field
www.MailScanner.info
Professional Support Services at www.MailScanner.biz
MailScanner thanks transtec Computers for their support