Typical Bayes Size?

chardlist chardlist at chard.net
Mon Jun 19 14:37:38 IST 2006

I haven't specified a bayes_expiry_max_db_size.  I run bayes through MySQL,
in the bayes_seen table I have 946,511 records.  In bayes_tokens there are

Really, I just want to make sure MS is running as efficiently as possible.
The 182K tokens I have is seeming more in alignment with your 112K you said
was closer to normal below.

Any thoughts?


-----Original Message-----
From: mailscanner-bounces at lists.mailscanner.info
[mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf Of Matt
Sent: Friday, June 16, 2006 3:11 PM
To: MailScanner discussion
Subject: Re: Typical Bayes Size?

chardlist wrote:
> On a server that averages about 15,000 messages a day, with a mature bayes
> database that runs sa-learn --force-expire every night via cron, what
> I expect as a typical number of tokens?
> The last sa-learn --force-expire reported:
> expired old bayes database entries in 11 seconds
> 1011642 entries kept, 5188 deleted
> token frequency: 1-occurrence tokens: 13.09%
> token frequency: less than 8 occurrences: 5.09%
> Is over 1 million tokens normal or is something fishy going on?

That seems rather large.

Have you declared a bayes_expiry_max_db_size?

If you don't have one declared, the default is 150k, which means that SA
be aiming for 112k tokens when it does an expire.

That said, there are conditions in which SA will end up with a much larger
database, but this generally only affects young databases.

> When I lint the spamassassin config it reports:
> bayes: corpus size: nspam = 713810, nham = 212860

*shrug* that part's not very useful. It's the total count of all mail ever
trained. (ie: this counter never goes down due to expiry)

One thing that might be useful is the output of "sa-learn --dump magic".
at the spread of the various atimes can be helpful.

> Thanks for any advice/reassurance,
> -Brendan

MailScanner mailing list
mailscanner at lists.mailscanner.info

Before posting, read http://wiki.mailscanner.info/posting

Support MailScanner development - buy the book off the website! 

More information about the MailScanner mailing list