bayes expire tokens
Kai Schaetzl
maillists at CONACTIVE.COM
Thu Mar 10 00:31:35 GMT 2005
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "US-ASCII" character set. ]
[ Some characters may be displayed incorrectly. ]
Peter Bonivart wrote on Thu, 10 Mar 2005 00:11:47 +0100:
> But it seems to me that the bayes_seen is not touched by the expire
> process, it just keeps growing. Am I wrong about that?
>
You are absolutely right, yes. Sorry. It's not touched by sync or expire.
I thought it would get synced to the db, but it's apparently a separate
db. It stores information about which tokens matched when (last atime).
Having it in a separate file reduces writes to bayes_toks. If that file is
so big this indicates that a lot of matching tokens are in it. Could be a
lot of reasons: very high mail flux, lots of tokens in the db, lots of
spam coming in and matching ... Considering the problem with expiry it's
also possible that the db is so big because it carries lots of matching
information for old tokens which should have been expired (and their
respective bayes_seen record wiped away as well).
However, if I remember correctly you got the expiry fixed by scheduling
it, didn't you? Then I think it can't be the cause. Still quite unusual to
have such a big file, there's no logic in that file having a bigger size
than the toks file itself. Unless the tokens are all extremely small so
that the information about them takes more space than the actual token
(don't know if this is possible, speculating). If that is possible and
nearly all of your tokens get matched between each expiry (because your db
spans only over a few days), yes, I think then this would theoretically be
possible.
Again, how much mail gets thru? And what does --dump magic say about your
token structure?
Really, you should carry this over to the SA list, it doesn't have
anything to do with MS unless the MS usage causes some kind of corruption
in this file. But still, the best way is to ask SA developers for help
identifying the problem. This is definitely a problem that needs
resolving, such a big file probably slows writes to it quite a bit.
I think I even remember some threads about big bayes_seen files and what
caused them, in which I participated, but I don't remember anything.
Kai
--
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org
------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the MAQ (http://www.mailscanner.biz/maq/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).
Support MailScanner development - buy the book off the website!
More information about the MailScanner
mailing list