SA-Learn

Fri Apr 18 21:37:02 IST 2008

Vernon Webb wrote:
> This may be a silly question to some, but I would really like to learn 
> more about what sa-learn does. I have created a folder on my server that 
> I move all my SPAM mail to. Mind you this is only SPAM that is NOT 
> labeled as SPAM. Should I be moving all mail, even mail that is labeled 
> as such as well? And exactly what does this do? I assume that it somehow 
> trains MailScanner that this is SPAM, but how? Does it tell it that the 
> mail addressed and IPs that this mails come from are sending bad mail? 
> Is it only local to my server? Does it report these emails as SPAM to 
> some RBL? Please pardon the intrusion if taken as such, I am only trying 
> to better understand how MailScanner works.

Sa-learn trains the bayes database used by SpamAssassin. It doesn't report to 
RBLs, Razor, or anything else. That's what spamassassin -r is for.

As for feeding, I would strongly suggest not make any considerations other than 
"is this spam or not" when choosing whether to feed a message to sa-learn --spam.

If you're only feeding false negatives, you're introducing a bias into your 
bayes database. That will eventually cause you to miss some of the spam you were 
detecting.

I'd also suggest feeding some nonspam emails to sa-learn with the --ham 
parameter, instead of the --spam parameter.

In general, it's best to give sa-learn a realistic, well balanced diet from your 
email stream. Obviously it would be difficult to hand classify and train every 
message you receive, but that would be the theoretical ideal. Head in that 
direction as far as you can without causing yourself undue stress or hassle.