MailScanner and Spamassassin

Kevin Miller Kevin_Miller at ci.juneau.ak.us
Sat Oct 20 00:54:20 IST 2012


-----Original Message-----
From: mailscanner-bounces at lists.mailscanner.info [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf Of Peter Bonivart
Sent: Friday, October 19, 2012 3:09 PM
To: MailScanner discussion
Subject: Re: MailScanner and Spamassassin

On Sat, Oct 20, 2012 at 12:39 AM, Kevin Miller <Kevin_Miller at ci.juneau.ak.us> wrote:
> Lots of training bound up in the old database.

> How do you figure? Have you looked at the retension time of individual 
> records in the db? The refresh rate of the entire database is fairly high. 
> I doubt you get much better results after the first few days even.

I suppose it all depends on how much mail you get.  We only get between 5000-10000 messages a day and have 400-500 users.  Not a lot compared to an ISP that might be getting hundreds of thousands of messages a day.  My primary mail server reports this:  
  Number of Tokens:	221,102
I expect It would take some time to get that many tokens if I was starting from scratch.  And even if it did build w/in a couple days, that's a couple days that the filtering could be a bit more optimal.  Every little bit helps.  No idea how big Mike's mail store volume is.

> I've always been skeptical about this myth surrounding Bayes. It's just one
> test out of a thousand and it's easily fooled as well. I trust the auto-learner 
> to do it's job and I haven't seen others get significantly better results 
> regardless of effort put into Bayes.

It's jack-simple to copy over the old bayes database.  Hardly any effort at all really.  If it's from a current corpus it's going to help.  Maybe not an earth shaking amount, but spamassassin is all about lots of little incremental scores.  Might be the difference between a 4.95 score and a 5.0.  Nothing I told him should take longer than maybe five minutes.

If you have any insights and tricks for getting a new build up and running efficiently as quickly as possible I'm sure we'd all love to hear them.

 ...Kevin
--
Kevin Miller
Network/email Administrator, CBJ MIS Dept.
155 South Seward Street
Juneau, Alaska 99801
Phone: (907) 586-0242, Fax: (907) 586-4500
Registered Linux User No: 307357



More information about the MailScanner mailing list