Moving Bayes database to tmps

infernix infernix at infernix.net
Wed Jun 3 22:36:49 IST 2009


Kai Schaetzl wrote:
> I forgot to look at these data. These are low figures. If you already have 
> a performance problem, then not because of Bayes. As I said I'm sure going 
> to SQL is better than using tmpfs.

I have seen the exact opposite (and have read elsewhere that SQL is 
really very expensive cpu-wise for what the Bayes engine does in SA), 
but my use case is perhaps non-standard.

We have 4 nodes doing about 1 million messages a day. Before I moved the 
bayes db to tmpfs, I had very large amounts of iowait, even though 
everything else (sendmail spool, mailscanner incoming+spool dirs) was 
already on tmpfs. Now the SATA disks in these boxes aren't great but I 
had expected better performance. Apparently the amount of concurrent 
messages we get at peak times is just too big to handle on disk platters 
(at least with single disks), so the move to tmpfs helped enormously.

In contrary to that, when I converted one of the bayes dbs to sql and 
configured all nodes to use one mysql server, the mysql box couldn't 
handle it. Net effect was that scanning messages was taking 5-15 seconds 
longer than before.

Right now, with everything in tmpfs, I am running 40 children on each 
box and iowait during peak hours is 0-1%. I could increase children if I 
wanted to but the current mail volume does not warrant that.

For data protection (it is tmpfs after all) I have written an init 
script that backs up the tmpfs on shutdown and restores it on bootup. 
I'm also making an emergency-backup.tar.gz of the tmpfs folder every 15 
minutes and use that tar file when the server crashes; at bootup i check 
for a shutdown-generated tar and if its not there i revert to the 
emergency-backup tar.

Just my $0.02.


More information about the MailScanner mailing list