"Required SpamAssassin Score" and Bayes

Mon Jan 5 17:17:44 GMT 2004

Executive summary:  Might a high value of MS "Required SpamAssassin Score"
interact adversely with SA Bayes?

Detail:
We started site-wide use of MailScanner some time ago (mid-2001), and of
SpamAssassin back in 2002.  Because of our worries about false positives,
we adjusted the MailScanner.conf "Required SpamAssassin Score" from its
default of 5 up to 7.

Things have moved on, and we are now happily using SA 2.61 including its
Bayes aspects.  But we find more emails than we would expect still escape
being spam-tagged: their spamscores seem strangely low.  Might it be that
our artificially high "Required SpamAssassin Score = 7" is causing the
Bayes mechanism to auto-learn some "Score = 5" and "6" spams incorrectly
as hams, and perhaps then to cause future occurences of these spams to be
marked down as hams (and thus escape being spam-tagged)?

I think we could reasonably confidently reduce "Required SA Score" from 7
down to 6 or 5, which would both catch a few more spams, and the resultant
Bayes autolearn might then catch more (positive feedback).

Is the above reasoning basically sound?  Or is it fundamentally flawed?

A supplementary question: Our SA/Bayes is currently only self-learning.
Are there any nicely packaged schemes to allow us to supplement this from
emails from validated individuals?  A few of us could then redirect
(bounce) emails to, say, "sa-learn-ham at ..." and "sa-learn-spam at ..." (but
in such a way that it would verify the redirector/bouncer (or some
equivalent) against a list of trusted folk).

--

:  David Lee                                I.T. Service          :
:  Systems Programmer                       Computer Centre       :
:                                           University of Durham  :
:  http://www.dur.ac.uk/t.d.lee/            South Road            :
:                                           Durham                :
:  Phone: +44 191 334 2752                  U.K.                  :