Oversight in MailScanner's Bayes Implementation?

Sat Dec 13 20:41:28 GMT 2003

It appears the Bayes feature available through spamassassin specifically
the way MailScanner implements it evaluates all incoming mail in a bulk
fashion, i.e. each individual user does not have his own Bayes database.

Is my assumption correct?

If so, it seems the power of the Bayes analysis seems markedly reduced
through the default MailScanner configuration.  As the accuracy of the
Bayes algorithm relies upon the specific patterns unique to each individual
user's collection of spam and ham, processing all incoming mail to a single
Bayes database seems less powerful than per-user databases.

Of course, with a small number of users receiving similar types of incoming
ham and spam, the decrease in Bayes accuracy might not be noticable.  But
with larger variations in incoming mail between users, could this not
reduce the power of the Bayes implementation to the standard of the
traditional spamassassin rules?

My own experience seems to suggest so.  A recent addition of a user with
quite different spam and ham patterns from others on the same server seemed
to qualitatively increase the false negatives for all users.  Migration of
his account to a seperate server immediately restored the spam detection
accuracy.

I'm sure per-user databases would be possible through a hack of
MailScanner, but might their default availability in a future release
enhance MailScanner's sophistication?