SDBM (was: Minimum hardware capacity for 35k e-mail scans/day)

shuttlebox shuttlebox at gmail.com
Wed Nov 21 09:11:56 GMT 2007


On Nov 21, 2007 9:46 AM, Martin.Hepworth <martinh at solidstatelogic.com> wrote:
>
> Tried it and my box ground to a halt after 30 mins of trying to do the sa-learn --restore.
>
> Must be semaphores or something....couldn't even login a the console. Luckily a CTRL-C on the sa-learn brought everthing back.
>
> My bayes seems to be 330MB so that's kinda large, and the SDBM was growing slowly and had got to 260MB when it all ground to a halt....maybe my 2.8Piv and 1.5GB ram ain't enough ;-)

I have never understood why so many put so much effort into Bayes.
It's just one test out of many.

If I'm not interpreting the output from the expire option incorrectly
around 50-75% of the tokens get deleted when I do my daily expire from
cron. If the database refreshes at that rate, what's the point of
trying to keep an old database at any cost? The content may not be
very old anyway. I haven't seen any real difference in accuracy from a
new db starting to score after learning 200 spam and ham each to a db
that is over three months old. I have never bothered with learning
either, I have just trusted the autolearning mechanism to do it for
me. Maybe my Bayes gets fooled more often than a "properly" trained
one but I doubt it's possible to avoid anyway due to how Bayes works
so I just lowered the negative scoring of low probability Bayes hits.

If you can't convert your old db, just rename the old files and start
over with a new db. You can always go back. :-)

-- 
/peter


More information about the MailScanner mailing list