Debugging Spamassassin

Tue Apr 29 17:02:22 IST 2003

At 09:16 AM 4/29/2003 -0500, Marco Obaid wrote:
>Hi,
>
>Running "spamassassin -D --lint", one of the output lines looks like this:
>debug: bayes corpus size: nspam = 14120, nham = 20635
>What does "corpus" mean? is nspam means number of spam detected by SA so far?
>Or, spam learned so far?

That's in the context of the bayes engine, so it's strictly the number of
spam and nonspam (ham) learned, not the total processed. (SA only
auto-learns at more extreme scores than the general spam/nonspam threshold.)

>Finally, can I safely delete *.db files in my /var/spool/spamassassin?
>The time stamp on those files never changed for 2 months:
>
>-rw-------    1 root     root     134324224 Feb 10 00:17 auto-whitelist.db
>-rw-r--r--    1 root     root       103570 Apr 29 09:06 bayes_journal
>-rw-r--r--    1 root     root          361 Apr 29 09:06 bayes_msgcount
>-rw-------    1 root     root      2613248 Apr 29 09:06 bayes_seen
>-rw-------    1 root     root       327680 Feb 25 08:28 bayes_seen.db
>-rw-r--r--    1 root     root      3895296 Apr 29 09:06 bayes_toks
>-rw-------    1 root     root      8720384 Feb 25 08:28 bayes_toks.db
>-rw-r--r--    1 root     root         1218 Apr 27 18:45 user_prefs

All of the those files (except user_prefs) can safely be deleted if you
make sure to shut down any SA processes first. Those files are really only
used to store data about the past trends of email SA has seen, so if they
are deleted, it will build new ones from a clean slate. Worst case here is
you loose your bayes training.

However user_prefs is not dynamically generated. But if you're calling SA
via MailScanner it doesn't use a user_prefs and instead uses it's own
spam.assassin.prefs.conf.

So you might want to look at user_prefs, but the rest of the files are just
state data for the auto-whitelist (something you should NOT use with
mailscanner without thinking about the implications of score smearing) and
the bayes tokenizer.