Spamprob Training
Kai Schaetzl
maillists at conactive.com
Mon May 22 12:31:16 IST 2006
Yadavenedra Awasthi wrote on Mon, 22 May 2006 14:46:06 +0530:
> What are different training methods for spamprobe . e.g Global, user
> specific , any other if exists. Which is the most preferable method and
> why... there advantages and disadvantages. There are ways that it can be
> achieved by procmail, automatic script or asking users to manually mark
> mails as spam but this is possible if IMAP is used if users use POP
> without leave a copy and then what is the way out.
You could have them mail the offending spam to a certain alias that sucks
these messages in and applies some additional steps to make sure that the
format of the mail os ok for sa-learn. I suggest you read the docs and the
wiki for spamassassin since this is all SA territory and this kind of
question has been answered and discussed there (on the sa-talk list)
umpteen times. It really depends on your needs. There are organizations
where using the IMAP junk folder learning method works very well. And there
are quite a few solutions to do this which have all been explained on the
sa-talk list. There are others were it can't because most people retrieve
the mail locally. Same goes for global vs. user-specific Bayes. It depends
on how much space you want to spend for this and how diverse the spam is
you are getting (and probably other things ;-). From my experience a global
Bayes database works just fine and is a better choice. The reasoning behind
this is that unless you (as a single person) get a *lot* of mail you won't
get enough mail to really train your Bayes in a way that it is helpful in
spam detection. And user-specific Bayes can take up quite a bit of space.
But that's just a theoretical baseline, it doesn't mean it's always the
better solution.
As for training, you can also let SA just auto-learn. This has been quite
successful in the past, but nowadays where 90% of our spam is already
rejected before SA sees it many messages don't get enough partial hits for
auto-learning. Therefore I will start feeding high-scoring spam from the
quarantine to sa-learn really soon (as soon as I've found time to put a
script together).
Kai
--
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
More information about the MailScanner
mailing list