"Required SpamAssassin Score" and Bayes

Mon Jan 5 23:08:55 GMT 2004

Hi David,

>>> But we find more emails than we would expect still escape being
spam-tagged: their spamscores seem strangely low.

I too have seen similar patterns of spam scoring strangely low and spent
some time over the weekend using MailWatch to work out why this was
happening.

I checked the 'Received:' headers IP addresses via OpenRBL.org and realised
that although these messages were listed in quite a few RBL's - SpamAssassin
had not picked up on this - further debugging via:

spamassassin -D rbl=-3 -p /etc/MailScanner/spam.assassin.prefs.conf <
message 2>&1 | less

and I discovered that for some reason SA was 'trusting' the first host on
the received line and not checking it against the RBL's.  I ended up adding:

trusted_networks 127.0.0.1 10/8 172.16/12 192.168/16 <<external mx>>
<<external mx>>

in spam.assassin.prefs.conf and double-checked the settings by running SA in
debug across a range of messages to make sure that SA was checking the RBL's
as expected.

For good measure I also added:

# Manually add in the CBL until SA has it by default
header RCVD_IN_CBL      eval:check_rbl_txt('cbl', 'cbl.abuseat.org.')
describe RCVD_IN_CBL    Received via a relay in cbl.abuseat.org
tflags RCVD_IN_CBL      net
score RCVD_IN_CBL       5

And where these low-scoring spam were once slipping through - they aren't
now!

Hope this helps.

Kind regards,
Steve.

-----Original Message-----
From: David Lee
To: MAILSCANNER at JISCMAIL.AC.UK
Sent: 05/01/04 17:17
Subject: "Required SpamAssassin Score" and Bayes

Executive summary:  Might a high value of MS "Required SpamAssassin
Score"
interact adversely with SA Bayes?

Detail:
We started site-wide use of MailScanner some time ago (mid-2001), and of
SpamAssassin back in 2002.  Because of our worries about false
positives,
we adjusted the MailScanner.conf "Required SpamAssassin Score" from its
default of 5 up to 7.

Things have moved on, and we are now happily using SA 2.61 including its
Bayes aspects.  But we find more emails than we would expect still
escape
being spam-tagged: their spamscores seem strangely low.  Might it be
that
our artificially high "Required SpamAssassin Score = 7" is causing the
Bayes mechanism to auto-learn some "Score = 5" and "6" spams incorrectly
as hams, and perhaps then to cause future occurences of these spams to
be
marked down as hams (and thus escape being spam-tagged)?

I think we could reasonably confidently reduce "Required SA Score" from
7
down to 6 or 5, which would both catch a few more spams, and the
resultant
Bayes autolearn might then catch more (positive feedback).

Is the above reasoning basically sound?  Or is it fundamentally flawed?

A supplementary question: Our SA/Bayes is currently only self-learning.
Are there any nicely packaged schemes to allow us to supplement this
from
emails from validated individuals?  A few of us could then redirect
(bounce) emails to, say, "sa-learn-ham at ..." and "sa-learn-spam at ..." (but
in such a way that it would verify the redirector/bouncer (or some
equivalent) against a list of trusted folk).

--

:  David Lee                                I.T. Service          :
:  Systems Programmer                       Computer Centre       :
:                                           University of Durham  :
:  http://www.dur.ac.uk/t.d.lee/            South Road            :
:                                           Durham                :
:  Phone: +44 191 334 2752                  U.K.                  :

--
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender and delete the message from your mailbox.

This footnote also confirms that this email message has been swept by
MailScanner (www.mailscanner.info) for the presence of computer viruses.