"Required SpamAssassin Score" and Bayes

Julian Field mailscanner at ecs.soton.ac.uk
Mon Jan 5 17:24:32 GMT 2004

At 17:17 05/01/2004, you wrote:
>Executive summary:  Might a high value of MS "Required SpamAssassin Score"
>interact adversely with SA Bayes?
>We started site-wide use of MailScanner some time ago (mid-2001), and of
>SpamAssassin back in 2002.  Because of our worries about false positives,
>we adjusted the MailScanner.conf "Required SpamAssassin Score" from its
>default of 5 up to 7.
>Things have moved on, and we are now happily using SA 2.61 including its
>Bayes aspects.  But we find more emails than we would expect still escape
>being spam-tagged: their spamscores seem strangely low.  Might it be that
>our artificially high "Required SpamAssassin Score = 7" is causing the
>Bayes mechanism to auto-learn some "Score = 5" and "6" spams incorrectly
>as hams, and perhaps then to cause future occurences of these spams to be
>marked down as hams (and thus escape being spam-tagged)?

No. The auto-learning is triggered by 2 theresholds which are set inside
SpamAssassin. The "Required SpamAssassin Score" is totally different, and
SpamAssassin is never even told what number it is.

>I think we could reasonably confidently reduce "Required SA Score" from 7
>down to 6 or 5, which would both catch a few more spams, and the resultant
>Bayes autolearn might then catch more (positive feedback).

We run at 6 and see no false positives, just a few false negatives. 5 was
too low and we started seeing false positives at that setting.

>Is the above reasoning basically sound?  Or is it fundamentally flawed?

No, and yes :-)

>A supplementary question: Our SA/Bayes is currently only self-learning.
>Are there any nicely packaged schemes to allow us to supplement this from
>emails from validated individuals?  A few of us could then redirect
>(bounce) emails to, say, "sa-learn-ham at ..." and "sa-learn-spam at ..." (but
>in such a way that it would verify the redirector/bouncer (or some
>equivalent) against a list of trusted folk).

You can control access to addresses using the check_compat stuff inside
sendmail's access DB (the sendmail Bat Book 3rd Edition will tell you how).
You can then just do an hourly learn using the --mbox switch to sa-learn. I
have a cron job which does this which I have posted here several times
before. It might be called learn.spam or something like that. Look for my
postings with attachments (there aren't too many of those).

Julian Field
Professional Support Services at www.MailScanner.biz
MailScanner thanks transtec Computers for their support
PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654

More information about the MailScanner mailing list