Training spamassassin Bayes
Casey T. Deccio
casey at deccio.net
Thu Aug 17 17:40:05 IST 2006
On Thu, 2006-08-17 at 11:49 -0400, DAve wrote:
> Casey T. Deccio wrote:
> > Should there be any problem with me doing
> > training using sa-learn as root while also doing auto training (turned
> > on by default--at least in Debian)? Spam classification has gotten
> > extremely poor sincne I began doing that.
> >
> >
> spam.assassin.prefs.conf;
> bayes_path /usr/local/etc/MailScanner/bayes/bayes
> bayes_file_mode 0770
> bayes_auto_learn 1
> bayes_ignore_header X-MailScanner
> bayes_ignore_header X-MailScanner-SpamCheck
> bayes_ignore_header X-MailScanner-SpamScore
> bayes_ignore_header X-MailScanner-Information
> bayes_ignore_header X-Account_key
> bayes_ignore_header X-UIDL
> bayes_ignore_header X-Mozilla-Status
> bayes_ignore_header X-Mozilla-Status2
>
MailScanner.conf seems to be okay. However, in spam.assassin.prefs.conf
I seem to have had my bayes_ignore_header lines misconfigured, so they
didn't match the X-MailScanner-* headers in MailScanner.conf.
Could this be tainting my spam training (significantly)? If so, do I
need to clear out the old data from my bayes database and start over?
Also, should I add certain client headers to this list (e.g., evolution,
mozilla, or whatever)?
> Perms are,
> bash-2.05b# ls -la | less
> total 2462018
> drwxr-xr-x 2 root cvs 38912 Aug 17 11:43 .
> dr-xr-xr-x 8 root cvs 1024 Aug 8 14:36 ..
> -rw----rw- 1 root cvs 10632 Aug 17 11:45 bayes.mutex
> -rw-rw---- 1 root cvs 78120 Aug 17 11:45 bayes_journal
> -rw-rw---- 1 root cvs 10190848 Aug 17 11:45 bayes_seen
> -rw-rw---- 1 root cvs 10174464 Aug 17 11:45 bayes_toks
bash-2.05b# ls -la | less
-rw------- 1 Debian-exim Debian-exim 651264 2006-08-17 09:21
auto-whitelist
-rw-rw-rw- 1 root root 27084 2006-08-17 06:31
bayes.mutex
-rw------- 1 Debian-exim Debian-exim 1290240 2006-08-17 07:41
bayes_seen
-rw------- 1 Debian-exim Debian-exim 10522624 2006-08-17 09:29
bayes_toks
-rw------- 1 Debian-exim Debian-exim 1294336 2006-07-21 17:46
bayes_toks.expire10036
-rw------- 1 Debian-exim Debian-exim 1409024 2006-07-17 09:16
bayes_toks.expire10080
-rw------- 1 Debian-exim Debian-exim 1445888 2006-07-15 01:11
bayes_toks.expire10092
...
[many more bayes_toks.expire* files]
>
> What does your reporting say? If you train a "insert favorite spam here"
> message and then see more of them come through later are they showing
> Bayes scores?
At first glance no, but I'll need to monitor from here out to see.
Casey
More information about the MailScanner
mailing list