Which messages to feed to Bayes?

Peter Bonivart peter at UCGBOOK.COM
Fri Feb 27 18:26:29 GMT 2004


Matt Kettler wrote:
  > As you've mentioned sa-learn already has the feature of not
re-learning the
> same message, so you're not wasting much CPU time for SA to decide it's
> already seen a given message ID and move on without learning it. That
> feature isn't a reason to avoid training..
>
> Using a mailbox of 205 messages, all tagged by SA. 108 of them were over 15
> in score, 97 of them under 15 in score but over 5.
>
>         110 of the 205 were learnable by sa-learn.
>
> I'll admit I'm using the default threshold of 12.. but that autolearned
> less than half of these messages. It didn't even autolearn all the
> high-scoring messages.

I was under the impression he was to manually feed mail from an Exchange
system to learn them. I doubt he can get better results than autolearn
in that case, especially considering the effort involved.

Interesting experiment you did with the autolearn feature, I would guess
locking issues caused most of the failures.

In 4.26, where an autolearn flag was introduced in the log, do you know
if that indicates a truly learned message or just that it triggered the
autolearned but it might have failed?

--
/Peter Bonivart

--Unix lovers do it in the Sun

Sun Fire V210, Solaris 9, Sendmail 8.12.10, MailScanner 4.25-14,
SpamAssassin 2.63 + DCC 1.2.30, ClamAV 0.67 + GMP 4.1.2



More information about the MailScanner mailing list