Training spamassassin Bayes

DAve dave.list at pixelhammer.com
Thu Aug 17 16:49:56 IST 2006


Casey T. Deccio wrote:
> On Tue, 2006-08-15 at 21:53 -0700, Casey T. Deccio wrote:
>> I'm using a Debian system with 
>> Exim4/MailScanner/Spamassassin/Courier-imap.  Using the default 
>> Spamassassin settings (including auto-learn), about half of the SPAM 
>> emails were incorrectly classified as ham.  I recently created a script 
>> (see below) to run daily as a cron job, but the Spam classification has 
>> only gotten worse since then.  Any ideas?
> 
> Okay, let me simplify.  Should there be any problem with me doing
> training using sa-learn as root while also doing auto training (turned
> on by default--at least in Debian)?  Spam classification has gotten
> extremely poor sincne I began doing that.
> 
> Casey
> 
> 

That is how I have been doing it and no problems so far. Spam tagging is 
better than ever. This is the first time I've used Bayes and it hasn't 
been more trouble than it is worth, so I am very happy.

I am configured like so, though I will be moving my bayes onto my 
ramdisk soon.

MailScanner.conf;
Rebuild Bayes Every = 86400
Wait During Bayes Rebuild = no

spam.assassin.prefs.conf;
bayes_path /usr/local/etc/MailScanner/bayes/bayes
bayes_file_mode 0770
bayes_auto_learn 1
bayes_ignore_header X-MailScanner
bayes_ignore_header X-MailScanner-SpamCheck
bayes_ignore_header X-MailScanner-SpamScore
bayes_ignore_header X-MailScanner-Information
bayes_ignore_header X-Account_key
bayes_ignore_header X-UIDL
bayes_ignore_header X-Mozilla-Status
bayes_ignore_header X-Mozilla-Status2


Perms are,
bash-2.05b# ls -la | less
total 2462018
drwxr-xr-x  2 root  cvs     38912 Aug 17 11:43 .
dr-xr-xr-x  8 root  cvs      1024 Aug  8 14:36 ..
-rw----rw-  1 root  cvs     10632 Aug 17 11:45 bayes.mutex
-rw-rw----  1 root  cvs     78120 Aug 17 11:45 bayes_journal
-rw-rw----  1 root  cvs  10190848 Aug 17 11:45 bayes_seen
-rw-rw----  1 root  cvs  10174464 Aug 17 11:45 bayes_toks


What does your reporting say? If you train a "insert favorite spam here" 
message and then see more of them come through later are they showing 
Bayes scores?

DAve

-- 
Three years now I've asked Google why they don't have a
logo change for Memorial Day. Why do they choose to do logos
for other non-international holidays, but nothing for
Veterans?

Maybe they forgot who made that choice possible.


More information about the MailScanner mailing list