Debugging Spamassassin
Matt Kettler
mkettler at EVI-INC.COM
Tue Apr 29 17:02:22 IST 2003
At 09:16 AM 4/29/2003 -0500, Marco Obaid wrote:
>Hi,
>
>Running "spamassassin -D --lint", one of the output lines looks like this:
>debug: bayes corpus size: nspam = 14120, nham = 20635
>What does "corpus" mean? is nspam means number of spam detected by SA so far?
>Or, spam learned so far?
That's in the context of the bayes engine, so it's strictly the number of
spam and nonspam (ham) learned, not the total processed. (SA only
auto-learns at more extreme scores than the general spam/nonspam threshold.)
>Finally, can I safely delete *.db files in my /var/spool/spamassassin?
>The time stamp on those files never changed for 2 months:
>
>-rw------- 1 root root 134324224 Feb 10 00:17 auto-whitelist.db
>-rw-r--r-- 1 root root 103570 Apr 29 09:06 bayes_journal
>-rw-r--r-- 1 root root 361 Apr 29 09:06 bayes_msgcount
>-rw------- 1 root root 2613248 Apr 29 09:06 bayes_seen
>-rw------- 1 root root 327680 Feb 25 08:28 bayes_seen.db
>-rw-r--r-- 1 root root 3895296 Apr 29 09:06 bayes_toks
>-rw------- 1 root root 8720384 Feb 25 08:28 bayes_toks.db
>-rw-r--r-- 1 root root 1218 Apr 27 18:45 user_prefs
All of the those files (except user_prefs) can safely be deleted if you
make sure to shut down any SA processes first. Those files are really only
used to store data about the past trends of email SA has seen, so if they
are deleted, it will build new ones from a clean slate. Worst case here is
you loose your bayes training.
However user_prefs is not dynamically generated. But if you're calling SA
via MailScanner it doesn't use a user_prefs and instead uses it's own
spam.assassin.prefs.conf.
So you might want to look at user_prefs, but the rest of the files are just
state data for the auto-whitelist (something you should NOT use with
mailscanner without thinking about the implications of score smearing) and
the bayes tokenizer.
More information about the MailScanner
mailing list