Spamprob Training

Kai Schaetzl maillists at conactive.com
Mon May 22 12:31:16 IST 2006


Yadavenedra Awasthi wrote on Mon, 22 May 2006 14:46:06 +0530:

> What are different training methods for spamprobe . e.g Global, user 
> specific , any other if exists. Which is the most preferable method and 
> why... there advantages and disadvantages. There are ways that it can be 
> achieved by procmail, automatic script or asking users to manually mark 
> mails as spam but this is possible if IMAP is used if users use POP 
> without leave a copy and then what is the way out.

You could have them mail the offending spam to a certain alias that sucks 
these messages in and applies some additional steps to make sure that the 
format of the mail os ok for sa-learn. I suggest you read the docs and the 
wiki for spamassassin since this is all SA territory and this kind of 
question has been answered and discussed there (on the sa-talk list) 
umpteen times. It really depends on your needs. There are organizations 
where using the IMAP junk folder learning method works very well. And there 
are quite a few solutions to do this which have all been explained on the 
sa-talk list. There are others were it can't because most people retrieve 
the mail locally. Same goes for global vs. user-specific Bayes. It depends 
on how much space you want to spend for this and how diverse the spam is 
you are getting (and probably other things ;-). From my experience a global 
Bayes database works just fine and is a better choice. The reasoning behind 
this is that unless you (as a single person) get a *lot* of mail you won't 
get enough mail to really train your Bayes in a way that it is helpful in 
spam detection. And user-specific Bayes can take up quite a bit of space. 
But that's just a theoretical baseline, it doesn't mean it's always the 
better solution.
As for training, you can also let SA just auto-learn. This has been quite 
successful in the past, but nowadays where 90% of our spam is already 
rejected before SA sees it many messages don't get enough partial hits for 
auto-learning. Therefore I will start feeding high-scoring spam from the 
quarantine to sa-learn really soon (as soon as I've found time to put a 
script together).


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com





More information about the MailScanner mailing list