Which messages to feed to Bayes?

Thu Feb 26 23:31:06 GMT 2004

At 04:55 PM 2/26/2004, Michael St. Laurent wrote:
>Should we be feeding the Bayes engine in Spamassassin messages that it has
>recognized as spam?  The reason I am considering doing this is the thought
>that it would eventually increase the spam score on like emails until they
>break the high spam score level and get dropped instead of flagged.

My answer is a very emphatic YES!

There's absolutely NO valid reason to skip messages that SA caught when
doing training. And there are good, valid reasons to train them.

Those who naysay training tagged messages, or messages that are already
BAYES_99 are only doing so because they don't understand how bayes works,
and are coming to an incorrect conclusion that it won't help SA with other
spam.

The key factor is that bayes doesn't learn to recognize an email... it
learns about spam in general from each email. SA applies lessons learned
from one spam to other spam which isn't entirely the same, but may contain
some small similarities.

Feeding messages which SA already tags, and even ones that are already
BAYES_99 can help prevent false negatives in messages that wouldn't
otherwise catch because there were no tokens that matched it.

Even messages that are already BAYES_99 can contain tokens that SA hasn't
learned yet. BAYES_99 means that the tokens SA recognizes are collectively
likely to be spam, but it doesn't mean that there aren't any new tokens to
learn about in the message, and it doesn't mean that all the tokens even
have high spam probabilities.