Feature request: spam / bayesian score in subject

Matt Kettler mkettler at evi-inc.com
Thu Nov 8 15:23:11 GMT 2007


Hugo van der Kooij wrote:

> 
> Which would allow me, or other users, to know immediatly I do not have
> to feed this one to my bayesian filter because it has hit the maximum score.

You really should not try to avoid feeding messages to bayes for that reason.

Even if a message hits as high as possible on the bayes system, there can still 
be useful tokens in it that are worth training.

This is because bayes learns about words, not messages, and spam emails tend to 
slowly mutate over time. A message might hit really high on bayes because it has 
several key words that resemble last week's spam, but it can still contain 
several new words that will help it catch next week's spam.

If you try to avoid training those messages, you're going to wind up having to 
"catch up" later on after you start missing spam. Don't let your bayes get 
behind the curve. Train your spam as spam, and don't try to avoid training some 
messages vs others.

The only time it might be worthwhile to avoid training is if the autolearner 
already kicked in and learned it correctly. Re-training won't hurt, but it also 
won't do anything so it's somewhat pointless.





More information about the MailScanner mailing list