Bayes Poisoning? Spam with negative BAYES Scores - ahhhh

Matt Kettler mkettler at EVI-INC.COM
Wed Dec 24 16:50:21 GMT 2003


Short summary: Do not rely on autolearning as a sole source of bayes
training. It doesn't work.

At 11:07 AM 12/24/2003, Nathan Johanson wrote:
>Yep, I continue to see lots of SPAM getting through due to negative hits
>on Bayes.
>
>SpamAssassin (score=0.801,
>required 4, BAYES_00 -4.90, FORGED_RCVD_NET_HELO 4.10,
>HTML_MESSAGE 0.10, RCVD_NUMERIC_HELO 1.50)
>
>This is my game plan. I plan to implement these modifications in stages:

<snip>

>(5) Create some spam trap accounts (sales@ or webmaster@) and start
>training the bayes databases manually. I have been relying on the
>autolearning mechanism up to this point and it's clear that this isn't
>enough. I figure that if I feed some of these offending messages in to
>the system as SPAM, it should help resolve the problem. However, I am a
>little worried that this may tip the scales the other direction and
>cause more false positives. **Note: Anyone with some good pointers on
>this strategy, please send me your advise**


YES! Do this ASAP.

SpamAssassin's bayes engine MUST be manually trained to be effective..
bayes databases resulting from auto-learning only do not work well, as
you've seen.

The autolearn function is intended to be a supplement to, but not a
replacement for, manual training.

If you don't have any good source of ham trianing, set up some
"anti-spamtraps" too, create an account, and subscribe it to some
legitimate newsletters.. feed it's mail as --ham training.

A good, healthy bayes database that's well fed avoids a lot of these bayes
misclassifications. I still get a few in the BAYES_44 or so range, but I've
not gotten many BAYES_00 spams.

Out of 3191 tagged spams and 319 false negatives I have on-hand only 11
matched BAYES_00.

(Note: that's not representative of my FN rate.... the false negatives are
ALL of my FN's going back to 2002. I discard old tagged spam regularly. All
3191 tagged spams are fresh enough to have been run against bayes)



More information about the MailScanner mailing list