Bayesian training policy (crossposted from SpamAssassin ML)

Andrea Cogliati
Mon May 5 16:09:48 IST 2003


I asked the same question on the SpamAssassin ML but I'd like to hear
your opinion as well. Pls, don't flame me for crossposting... :-)

We recently setup MailScanner at our email gateway with a SA required
score of 9 (just to avoid most false positives). After a couple of weeks
of tests, we catched about 80% of spam with just one false positive (a
mailing list with TONS of ads, we just whitelisted it). Trying to
improve the detection ratio, we used sa_learn with about 2,000 messages
of spam and 3,000 messages of ham (manually checked) from the last 3
months and we now catch something like 90% of spam with no false

Now the question: we'd like to setup a Bayesian filter learning policy
that makes sense. What are your suggestions?

Thank you in advance for any help,


