Which messages to feed to Bayes?
peter at UCGBOOK.COM
Fri Feb 27 18:26:29 GMT 2004
Matt Kettler wrote:
> As you've mentioned sa-learn already has the feature of not
> same message, so you're not wasting much CPU time for SA to decide it's
> already seen a given message ID and move on without learning it. That
> feature isn't a reason to avoid training..
> Using a mailbox of 205 messages, all tagged by SA. 108 of them were over 15
> in score, 97 of them under 15 in score but over 5.
> 110 of the 205 were learnable by sa-learn.
> I'll admit I'm using the default threshold of 12.. but that autolearned
> less than half of these messages. It didn't even autolearn all the
> high-scoring messages.
I was under the impression he was to manually feed mail from an Exchange
system to learn them. I doubt he can get better results than autolearn
in that case, especially considering the effort involved.
Interesting experiment you did with the autolearn feature, I would guess
locking issues caused most of the failures.
In 4.26, where an autolearn flag was introduced in the log, do you know
if that indicates a truly learned message or just that it triggered the
autolearned but it might have failed?
--Unix lovers do it in the Sun
Sun Fire V210, Solaris 9, Sendmail 8.12.10, MailScanner 4.25-14,
SpamAssassin 2.63 + DCC 1.2.30, ClamAV 0.67 + GMP 4.1.2
More information about the MailScanner