Typical Bayes Size?

Matt Kettler mkettler at evi-inc.com
Fri Jun 16 21:11:19 IST 2006


chardlist wrote:
> On a server that averages about 15,000 messages a day, with a mature bayes
> database that runs sa-learn --force-expire every night via cron, what should
> I expect as a typical number of tokens?
> 
> The last sa-learn --force-expire reported:
> 
> expired old bayes database entries in 11 seconds
> 1011642 entries kept, 5188 deleted
> token frequency: 1-occurrence tokens: 13.09%
> token frequency: less than 8 occurrences: 5.09%
> 
> Is over 1 million tokens normal or is something fishy going on?

That seems rather large.

Have you declared a bayes_expiry_max_db_size?

If you don't have one declared, the default is 150k, which means that SA should
be aiming for 112k tokens when it does an expire.

That said, there are conditions in which SA will end up with a much larger
database, but this generally only affects young databases.


> 
> When I lint the spamassassin config it reports:
> 
> bayes: corpus size: nspam = 713810, nham = 212860

*shrug* that part's not very useful. It's the total count of all mail ever
trained. (ie: this counter never goes down due to expiry)


One thing that might be useful is the output of "sa-learn --dump magic". Looking
at the spread of the various atimes can be helpful.

> 
> Thanks for any advice/reassurance,
> -Brendan
> 



More information about the MailScanner mailing list