Bayes not auto-learning at expected threshold

Mike Brudenell pmb1 at YORK.AC.UK
Tue Jun 29 11:03:01 IST 2004


Greetings -

We are using:
    MailScanner     4.29.3
    SpamAssassin    2.63

In my /etc/mail/spamassassin/local.cf I have:
    bayes_auto_learn_threshold_nonspam       0.1
    bayes_auto_learn_threshold_spam         12.0
and have checked my SpamAssassin config with its "--lint" option.

I thought this would tell SpamAssassin to auto-learn a message into the
Bayes database if its score was 12.0 or more.

Yet soe of the messages I'm getting through achieve higher scores, but
aren't marked autolearn=spam.

Here are a few samples of interest...
Message #1 has a score of 19.362 and *IS* auto-learned, whilst Message #2
has a higher score bus it *NOT* auto-learned.  Message #3 arrived after I
did a total stop/start of MailScanner just to make sure I hadn't forgotten
previously: it too was *NOT* auto-learned.

=============================================================
Msg #1

X-York-MailScanner-SpamCheck: spam, SpamAssassin (score=19.362, required 8,
        autolearn=spam, DCC_CHECK 2.91, FAKE_HELO_MAIL_COM 3.77,
        FORGED_MUA_OUTLOOK 2.57, FORGED_OUTLOOK_TAGS 1.00,
        HTML_FONTCOLOR_UNKNOWN 0.10, HTML_FONT_BIG 0.27, HTML_MESSAGE 0.10,
        HTML_MIME_NO_HTML_TAG 1.18, MIME_HTML_ONLY 0.32,
        MSGID_FROM_MTA_HEADER 0.70, OPT_HEADER 2.40, OPT_IN 0.23,
        RAZOR2_CF_RANGE_51_100 1.10, RAZOR2_CHECK 1.05,
        SARE_CHARSET_W1251 1.67)

=============================================================
Msg #2

X-York-MailScanner-SpamCheck: spam, SpamAssassin (score=19.644, required 8,
        AS_SEEN_ON 1.49, BAYES_99 5.40, BIZ_TLD 0.10, CLICK_BELOW 0.10,
        EARN_MONEY 1.01, HTML_70_80 0.10, HTML_FONTCOLOR_RED 0.10,
        HTML_FONTCOLOR_UNSAFE 0.10, HTML_FONT_BIG 0.27,
        HTML_LINK_CLICK_HERE 0.10, HTML_MESSAGE 0.10, J_CHICKENPOX_13 0.60,
        J_CHICKENPOX_15 0.60, MIME_HTML_ONLY 0.32, RATWR9_MESSID 0.80,
        RCVD_IN_DYNABLOCK 2.60, RCVD_IN_NJABL_DYNA 3.54, RCVD_IN_SORBS 0.10,
        SARE_BOUNDARY_07 2.22)

=============================================================
Msg #3

X-York-MailScanner-SpamCheck: spam, SpamAssassin (score=17.795, required 8,
        BAYES_99 5.40, MSGID_FROM_MTA_HEADER 0.70,
        RAZOR2_CF_RANGE_51_100 1.10, RAZOR2_CHECK 1.05,
        RCVD_IN_BL_SPAMCOP_NET 1.50, RCVD_IN_NJABL 0.10,
        RCVD_IN_NJABL_SPAM 1.21, RCVD_IN_RFCI 0.10, RCVD_IN_SBL 3.54,
        RCVD_IN_SORBS 0.10, WS_URI_RBL 3.00)

=============================================================

Can anyone shed any light on this behaviour please?  I'm including what I
think is the relevant extract of debug output from MailScanner below.  I
assume it's something to do with the "Score Set" chosen and the score shown
for the auto-learn line, which is substantially lower than the final 14.47.

Is it something like only the body-hits is used to determine whether to
auto-learn or not rather than also including the head-hits?  (I assume the
change from a head-hits of 8.365 to 9.07 is something to do with
recomputing it using a different Score Set?  Can anyone point to
information about these?)

=============================================================

debug: RBL: success for 16 of 16 queries
debug: running meta tests; score so far=8.365
debug: auto-learn? ham=0.1, spam=12, body-hits=8.365, head-hits=5.554
debug: auto-learn: currently using scoreset 3.  recomputing score based on
scoreset 1.
debug: Score set 1 chosen.
debug: auto-learn: original score: 9.07, recomputed score: 9.873
debug: Score set 3 chosen.
debug: auto-learn? no: inside auto-learn thresholds
debug: is spam? score=14.47 required=5
tests=BAYES_99,DCC_CHECK,HTML_FONTCOLOR_RED,HTML_MESSAGE,HTTP_ESCAPED_HOST,MSGID_FROM_MTA_HEADER,RAZOR2_CF_RANGE_51_100,RAZOR2_CHECK,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_RFCI
Stopping now as you are debugging me.

=============================================================

Cheers,

Mike Brudenell

--
The Computing Service, University of York, Heslington, York Yo10 5DD, UK
Tel:+44-1904-433811  FAX:+44-1904-433740

* Unsolicited commercial e-mail is NOT welcome at this e-mail address. *

-------------------------- MailScanner list ----------------------
To leave, send    leave mailscanner    to jiscmail at jiscmail.ac.uk
Before posting, please see the Most Asked Questions at
http://www.mailscanner.biz/maq/     and the archives at
http://www.jiscmail.ac.uk/lists/mailscanner.html



More information about the MailScanner mailing list