"Required SpamAssassin Score" and Bayes
David Lee
t.d.lee at DURHAM.AC.UK
Mon Jan 5 17:17:44 GMT 2004
Executive summary: Might a high value of MS "Required SpamAssassin Score"
interact adversely with SA Bayes?
Detail:
We started site-wide use of MailScanner some time ago (mid-2001), and of
SpamAssassin back in 2002. Because of our worries about false positives,
we adjusted the MailScanner.conf "Required SpamAssassin Score" from its
default of 5 up to 7.
Things have moved on, and we are now happily using SA 2.61 including its
Bayes aspects. But we find more emails than we would expect still escape
being spam-tagged: their spamscores seem strangely low. Might it be that
our artificially high "Required SpamAssassin Score = 7" is causing the
Bayes mechanism to auto-learn some "Score = 5" and "6" spams incorrectly
as hams, and perhaps then to cause future occurences of these spams to be
marked down as hams (and thus escape being spam-tagged)?
I think we could reasonably confidently reduce "Required SA Score" from 7
down to 6 or 5, which would both catch a few more spams, and the resultant
Bayes autolearn might then catch more (positive feedback).
Is the above reasoning basically sound? Or is it fundamentally flawed?
A supplementary question: Our SA/Bayes is currently only self-learning.
Are there any nicely packaged schemes to allow us to supplement this from
emails from validated individuals? A few of us could then redirect
(bounce) emails to, say, "sa-learn-ham at ..." and "sa-learn-spam at ..." (but
in such a way that it would verify the redirector/bouncer (or some
equivalent) against a list of trusted folk).
--
: David Lee I.T. Service :
: Systems Programmer Computer Centre :
: University of Durham :
: http://www.dur.ac.uk/t.d.lee/ South Road :
: Durham :
: Phone: +44 191 334 2752 U.K. :
More information about the MailScanner
mailing list