bayes expire tokens

Kai Schaetzl maillists at CONACTIVE.COM
Thu Mar 10 00:31:35 GMT 2005


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "US-ASCII" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Peter Bonivart wrote on         Thu, 10 Mar 2005 00:11:47 +0100:

> But it seems to me that the bayes_seen is not touched by the expire 
> process, it just keeps growing. Am I wrong about that?
>

You are absolutely right, yes. Sorry. It's not touched by sync or expire. 
I thought it would get synced to the db, but it's apparently a separate 
db. It stores information about which tokens matched when (last atime). 
Having it in a separate file reduces writes to bayes_toks. If that file is 
so big this indicates that a lot of matching tokens are in it. Could be a 
lot of reasons: very high mail flux, lots of tokens in the db, lots of 
spam coming in and matching ... Considering the problem with expiry it's 
also possible that the db is so big because it carries lots of matching 
information for old tokens which should have been expired (and their 
respective bayes_seen record wiped away as well). 
However, if I remember correctly you got the expiry fixed by scheduling 
it, didn't you? Then I think it can't be the cause. Still quite unusual to 
have such a big file, there's no logic in that file having a bigger size 
than the toks file itself. Unless the tokens are all extremely small so 
that the information about them takes more space than the actual token 
(don't know if this is possible, speculating). If that is possible and 
nearly all of your tokens get matched between each expiry (because your db 
spans only over a few days), yes, I think then this would theoretically be 
possible.
Again, how much mail gets thru? And what does --dump magic say about your 
token structure?
Really, you should carry this over to the SA list, it doesn't have 
anything to do with MS unless the MS usage causes some kind of corruption 
in this file. But still, the best way is to ask SA developers for help 
identifying the problem. This is definitely a problem that needs 
resolving, such a big file probably slows writes to it quite a bit.
I think I even remember some threads about big bayes_seen files and what 
caused them, in which I participated, but I don't remember anything.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org

------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the MAQ (http://www.mailscanner.biz/maq/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).

Support MailScanner development - buy the book off the website!




More information about the MailScanner mailing list