Fw: bayes autolearn

Soeren Gerlach so-mlist-alias at ALL-ABOUT-SHIFT.COM
Sun Jul 20 21:27:09 IST 2003


> On Sat, 19 Jul 2003 14:58:05 +0200, Soeren Gerlach <so-mlist-alias at ALL-
> ABOUT-SHIFT.COM> wrote:
>
> >I do have the same setup (two mail servers configured via MX as relays)
and
> >I do the following with quite remarkable results:
> >
> > * Quanrantine "high score spam" messages
> > * Collect them once a day from the two relays to another consolidation
> >server
> > * There use the "sa-learn" from S.A.
> > * Additionally I feed the mails back to the razor network
> > * copy back the resulting database to the two relays (MailScanner must
be
> >stopped while copying back because of file locks)
> >
> >With now some 5.000+ Spam messages it increases the overall "yield" quite
> >good. The problem with autolearn that the "ham" portion of the mail is
only
> >usefull one a single user basis (there're some articels about this issue
on
> >the net) because of the often individual dictionaries a user's mail have,
> >while the spam portion on the other hand isn't.
> >
> >regards,
> >Soeren Gerlach
>
> How much ham do you use when training?
> From http://www.spamassassin.org/doc/sa-learn.html
> "It's also worth noting that training with a very small quantity of ham,
> will produce atrocious results. You should aim to train with at least the
> same amount (or more if possible!) of ham data than spam"

I use no ham at all, spam only works quite well.


regards,
Soeren



More information about the MailScanner mailing list