Training spamassassin Bayes

Casey T. Deccio casey at deccio.net
Thu Aug 17 17:40:05 IST 2006


On Thu, 2006-08-17 at 11:49 -0400, DAve wrote:
> Casey T. Deccio wrote:
> > Should there be any problem with me doing
> > training using sa-learn as root while also doing auto training (turned
> > on by default--at least in Debian)?  Spam classification has gotten
> > extremely poor sincne I began doing that.
> > 
> > 

> spam.assassin.prefs.conf;
> bayes_path /usr/local/etc/MailScanner/bayes/bayes
> bayes_file_mode 0770
> bayes_auto_learn 1
> bayes_ignore_header X-MailScanner
> bayes_ignore_header X-MailScanner-SpamCheck
> bayes_ignore_header X-MailScanner-SpamScore
> bayes_ignore_header X-MailScanner-Information
> bayes_ignore_header X-Account_key
> bayes_ignore_header X-UIDL
> bayes_ignore_header X-Mozilla-Status
> bayes_ignore_header X-Mozilla-Status2
> 

MailScanner.conf seems to be okay.  However, in spam.assassin.prefs.conf
I seem to have had my bayes_ignore_header lines misconfigured, so they
didn't match the X-MailScanner-* headers in MailScanner.conf.

Could this be tainting my spam training (significantly)?  If so, do I
need to clear out the old data from my bayes database and start over?

Also, should I add certain client headers to this list (e.g., evolution,
mozilla, or whatever)?

> Perms are,
> bash-2.05b# ls -la | less
> total 2462018
> drwxr-xr-x  2 root  cvs     38912 Aug 17 11:43 .
> dr-xr-xr-x  8 root  cvs      1024 Aug  8 14:36 ..
> -rw----rw-  1 root  cvs     10632 Aug 17 11:45 bayes.mutex
> -rw-rw----  1 root  cvs     78120 Aug 17 11:45 bayes_journal
> -rw-rw----  1 root  cvs  10190848 Aug 17 11:45 bayes_seen
> -rw-rw----  1 root  cvs  10174464 Aug 17 11:45 bayes_toks

bash-2.05b# ls -la | less
-rw-------  1 Debian-exim Debian-exim   651264 2006-08-17 09:21
auto-whitelist
-rw-rw-rw-  1 root        root           27084 2006-08-17 06:31
bayes.mutex
-rw-------  1 Debian-exim Debian-exim  1290240 2006-08-17 07:41
bayes_seen
-rw-------  1 Debian-exim Debian-exim 10522624 2006-08-17 09:29
bayes_toks
-rw-------  1 Debian-exim Debian-exim  1294336 2006-07-21 17:46
bayes_toks.expire10036
-rw-------  1 Debian-exim Debian-exim  1409024 2006-07-17 09:16
bayes_toks.expire10080
-rw-------  1 Debian-exim Debian-exim  1445888 2006-07-15 01:11
bayes_toks.expire10092
...
[many more bayes_toks.expire* files]

> 
> What does your reporting say? If you train a "insert favorite spam here" 
> message and then see more of them come through later are they showing 
> Bayes scores?

At first glance no, but I'll need to monitor from here out to see.

Casey




More information about the MailScanner mailing list