Typical Bayes Size?

chardlist chardlist at chard.net
Fri Jun 16 04:58:26 IST 2006

On a server that averages about 15,000 messages a day, with a mature bayes
database that runs sa-learn --force-expire every night via cron, what should
I expect as a typical number of tokens?

The last sa-learn --force-expire reported:

expired old bayes database entries in 11 seconds
1011642 entries kept, 5188 deleted
token frequency: 1-occurrence tokens: 13.09%
token frequency: less than 8 occurrences: 5.09%

Is over 1 million tokens normal or is something fishy going on?

When I lint the spamassassin config it reports:

bayes: corpus size: nspam = 713810, nham = 212860

Thanks for any advice/reassurance,

