Bayes not auto-learning at expected threshold
Mike Brudenell
pmb1 at YORK.AC.UK
Tue Jun 29 11:03:01 IST 2004
Greetings -
We are using:
MailScanner 4.29.3
SpamAssassin 2.63
In my /etc/mail/spamassassin/local.cf I have:
bayes_auto_learn_threshold_nonspam 0.1
bayes_auto_learn_threshold_spam 12.0
and have checked my SpamAssassin config with its "--lint" option.
I thought this would tell SpamAssassin to auto-learn a message into the
Bayes database if its score was 12.0 or more.
Yet soe of the messages I'm getting through achieve higher scores, but
aren't marked autolearn=spam.
Here are a few samples of interest...
Message #1 has a score of 19.362 and *IS* auto-learned, whilst Message #2
has a higher score bus it *NOT* auto-learned. Message #3 arrived after I
did a total stop/start of MailScanner just to make sure I hadn't forgotten
previously: it too was *NOT* auto-learned.
=============================================================
Msg #1
X-York-MailScanner-SpamCheck: spam, SpamAssassin (score=19.362, required 8,
autolearn=spam, DCC_CHECK 2.91, FAKE_HELO_MAIL_COM 3.77,
FORGED_MUA_OUTLOOK 2.57, FORGED_OUTLOOK_TAGS 1.00,
HTML_FONTCOLOR_UNKNOWN 0.10, HTML_FONT_BIG 0.27, HTML_MESSAGE 0.10,
HTML_MIME_NO_HTML_TAG 1.18, MIME_HTML_ONLY 0.32,
MSGID_FROM_MTA_HEADER 0.70, OPT_HEADER 2.40, OPT_IN 0.23,
RAZOR2_CF_RANGE_51_100 1.10, RAZOR2_CHECK 1.05,
SARE_CHARSET_W1251 1.67)
=============================================================
Msg #2
X-York-MailScanner-SpamCheck: spam, SpamAssassin (score=19.644, required 8,
AS_SEEN_ON 1.49, BAYES_99 5.40, BIZ_TLD 0.10, CLICK_BELOW 0.10,
EARN_MONEY 1.01, HTML_70_80 0.10, HTML_FONTCOLOR_RED 0.10,
HTML_FONTCOLOR_UNSAFE 0.10, HTML_FONT_BIG 0.27,
HTML_LINK_CLICK_HERE 0.10, HTML_MESSAGE 0.10, J_CHICKENPOX_13 0.60,
J_CHICKENPOX_15 0.60, MIME_HTML_ONLY 0.32, RATWR9_MESSID 0.80,
RCVD_IN_DYNABLOCK 2.60, RCVD_IN_NJABL_DYNA 3.54, RCVD_IN_SORBS 0.10,
SARE_BOUNDARY_07 2.22)
=============================================================
Msg #3
X-York-MailScanner-SpamCheck: spam, SpamAssassin (score=17.795, required 8,
BAYES_99 5.40, MSGID_FROM_MTA_HEADER 0.70,
RAZOR2_CF_RANGE_51_100 1.10, RAZOR2_CHECK 1.05,
RCVD_IN_BL_SPAMCOP_NET 1.50, RCVD_IN_NJABL 0.10,
RCVD_IN_NJABL_SPAM 1.21, RCVD_IN_RFCI 0.10, RCVD_IN_SBL 3.54,
RCVD_IN_SORBS 0.10, WS_URI_RBL 3.00)
=============================================================
Can anyone shed any light on this behaviour please? I'm including what I
think is the relevant extract of debug output from MailScanner below. I
assume it's something to do with the "Score Set" chosen and the score shown
for the auto-learn line, which is substantially lower than the final 14.47.
Is it something like only the body-hits is used to determine whether to
auto-learn or not rather than also including the head-hits? (I assume the
change from a head-hits of 8.365 to 9.07 is something to do with
recomputing it using a different Score Set? Can anyone point to
information about these?)
=============================================================
debug: RBL: success for 16 of 16 queries
debug: running meta tests; score so far=8.365
debug: auto-learn? ham=0.1, spam=12, body-hits=8.365, head-hits=5.554
debug: auto-learn: currently using scoreset 3. recomputing score based on
scoreset 1.
debug: Score set 1 chosen.
debug: auto-learn: original score: 9.07, recomputed score: 9.873
debug: Score set 3 chosen.
debug: auto-learn? no: inside auto-learn thresholds
debug: is spam? score=14.47 required=5
tests=BAYES_99,DCC_CHECK,HTML_FONTCOLOR_RED,HTML_MESSAGE,HTTP_ESCAPED_HOST,MSGID_FROM_MTA_HEADER,RAZOR2_CF_RANGE_51_100,RAZOR2_CHECK,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_RFCI
Stopping now as you are debugging me.
=============================================================
Cheers,
Mike Brudenell
--
The Computing Service, University of York, Heslington, York Yo10 5DD, UK
Tel:+44-1904-433811 FAX:+44-1904-433740
* Unsolicited commercial e-mail is NOT welcome at this e-mail address. *
-------------------------- MailScanner list ----------------------
To leave, send leave mailscanner to jiscmail at jiscmail.ac.uk
Before posting, please see the Most Asked Questions at
http://www.mailscanner.biz/maq/ and the archives at
http://www.jiscmail.ac.uk/lists/mailscanner.html
More information about the MailScanner
mailing list