Bayes not auto-learning at expected threshold
Desai, Jason
jase at SENSIS.COM
Tue Jun 29 13:37:02 IST 2004
Quoting from man Mail::SpamAssassin::Conf -
========
Note that certain tests are ignored when determining
whether a message should be trained upon:
- auto-whitelist (AWL)
- rules with tflags set to 'learn' (the Bayesian rules)
- rules with tflags set to 'userconf' (user white/black-listing rules, etc)
Also note that auto-training occurs using scores from
either scoreset 0 or 1, depending on what scoreset is
used during message check. It is likely that the mes-
sage check and auto-train scores will be different.
========
Jase
Mike Brudenell wrote:
> Greetings -
>
> We are using:
> MailScanner 4.29.3
> SpamAssassin 2.63
>
> In my /etc/mail/spamassassin/local.cf I have:
> bayes_auto_learn_threshold_nonspam 0.1
> bayes_auto_learn_threshold_spam 12.0
> and have checked my SpamAssassin config with its "--lint" option.
>
> I thought this would tell SpamAssassin to auto-learn a message into
> the Bayes database if its score was 12.0 or more.
>
> Yet soe of the messages I'm getting through achieve higher scores, but
> aren't marked autolearn=spam.
>
> Here are a few samples of interest...
> Message #1 has a score of 19.362 and *IS* auto-learned, whilst
> Message #2 has a higher score bus it *NOT* auto-learned. Message #3
> arrived after I did a total stop/start of MailScanner just to make
> sure I hadn't forgotten previously: it too was *NOT* auto-learned.
>
> =============================================================
> Msg #1
>
> X-York-MailScanner-SpamCheck: spam, SpamAssassin (score=19.362,
> required 8, autolearn=spam, DCC_CHECK 2.91,
> FAKE_HELO_MAIL_COM 3.77, FORGED_MUA_OUTLOOK 2.57,
> FORGED_OUTLOOK_TAGS 1.00, HTML_FONTCOLOR_UNKNOWN 0.10,
> HTML_FONT_BIG 0.27, HTML_MESSAGE 0.10, HTML_MIME_NO_HTML_TAG
> 1.18, MIME_HTML_ONLY 0.32, MSGID_FROM_MTA_HEADER 0.70,
> OPT_HEADER 2.40, OPT_IN 0.23, RAZOR2_CF_RANGE_51_100 1.10,
> RAZOR2_CHECK 1.05, SARE_CHARSET_W1251 1.67)
>
> =============================================================
> Msg #2
>
> X-York-MailScanner-SpamCheck: spam, SpamAssassin (score=19.644,
> required 8, AS_SEEN_ON 1.49, BAYES_99 5.40, BIZ_TLD 0.10,
> CLICK_BELOW 0.10, EARN_MONEY 1.01, HTML_70_80 0.10,
> HTML_FONTCOLOR_RED 0.10, HTML_FONTCOLOR_UNSAFE 0.10,
> HTML_FONT_BIG 0.27, HTML_LINK_CLICK_HERE 0.10, HTML_MESSAGE
> 0.10, J_CHICKENPOX_13 0.60, J_CHICKENPOX_15 0.60,
> MIME_HTML_ONLY 0.32, RATWR9_MESSID 0.80, RCVD_IN_DYNABLOCK
> 2.60, RCVD_IN_NJABL_DYNA 3.54, RCVD_IN_SORBS 0.10,
> SARE_BOUNDARY_07 2.22)
>
> =============================================================
> Msg #3
>
> X-York-MailScanner-SpamCheck: spam, SpamAssassin (score=17.795,
> required 8, BAYES_99 5.40, MSGID_FROM_MTA_HEADER 0.70,
> RAZOR2_CF_RANGE_51_100 1.10, RAZOR2_CHECK 1.05,
> RCVD_IN_BL_SPAMCOP_NET 1.50, RCVD_IN_NJABL 0.10,
> RCVD_IN_NJABL_SPAM 1.21, RCVD_IN_RFCI 0.10, RCVD_IN_SBL 3.54,
> RCVD_IN_SORBS 0.10, WS_URI_RBL 3.00)
>
> =============================================================
>
> Can anyone shed any light on this behaviour please? I'm including
> what I think is the relevant extract of debug output from MailScanner
> below. I assume it's something to do with the "Score Set" chosen and
> the score shown for the auto-learn line, which is substantially lower
> than the final 14.47.
>
> Is it something like only the body-hits is used to determine whether
> to auto-learn or not rather than also including the head-hits? (I
> assume the change from a head-hits of 8.365 to 9.07 is something to
> do with recomputing it using a different Score Set? Can anyone point
> to information about these?)
>
> =============================================================
>
> debug: RBL: success for 16 of 16 queries
> debug: running meta tests; score so far=8.365
> debug: auto-learn? ham=0.1, spam=12, body-hits=8.365, head-hits=5.554
> debug: auto-learn: currently using scoreset 3. recomputing score
> based on scoreset 1.
> debug: Score set 1 chosen.
> debug: auto-learn: original score: 9.07, recomputed score: 9.873
> debug: Score set 3 chosen.
> debug: auto-learn? no: inside auto-learn thresholds
> debug: is spam? score=14.47 required=5
>
tests=BAYES_99,DCC_CHECK,HTML_FONTCOLOR_RED,HTML_MESSAGE,HTTP_ESCAPED_HOST,M
SGID_FROM_MTA_HEADER,RAZOR2_CF_RANGE_51_100,RAZOR2_CHECK,RCVD_IN_BL_SPAMCOP_
NET,RCVD_IN_RFCI
> Stopping now as you are debugging me.
>
> =============================================================
>
> Cheers,
>
> Mike Brudenell
-------------------------- MailScanner list ----------------------
To leave, send leave mailscanner to jiscmail at jiscmail.ac.uk
Before posting, please see the Most Asked Questions at
http://www.mailscanner.biz/maq/ and the archives at
http://www.jiscmail.ac.uk/lists/mailscanner.html
More information about the MailScanner
mailing list