MailScanner and Spamassassin
Kevin Miller
Kevin_Miller at ci.juneau.ak.us
Sat Oct 20 00:54:20 IST 2012
-----Original Message-----
From: mailscanner-bounces at lists.mailscanner.info [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf Of Peter Bonivart
Sent: Friday, October 19, 2012 3:09 PM
To: MailScanner discussion
Subject: Re: MailScanner and Spamassassin
On Sat, Oct 20, 2012 at 12:39 AM, Kevin Miller <Kevin_Miller at ci.juneau.ak.us> wrote:
> Lots of training bound up in the old database.
> How do you figure? Have you looked at the retension time of individual
> records in the db? The refresh rate of the entire database is fairly high.
> I doubt you get much better results after the first few days even.
I suppose it all depends on how much mail you get. We only get between 5000-10000 messages a day and have 400-500 users. Not a lot compared to an ISP that might be getting hundreds of thousands of messages a day. My primary mail server reports this:
Number of Tokens: 221,102
I expect It would take some time to get that many tokens if I was starting from scratch. And even if it did build w/in a couple days, that's a couple days that the filtering could be a bit more optimal. Every little bit helps. No idea how big Mike's mail store volume is.
> I've always been skeptical about this myth surrounding Bayes. It's just one
> test out of a thousand and it's easily fooled as well. I trust the auto-learner
> to do it's job and I haven't seen others get significantly better results
> regardless of effort put into Bayes.
It's jack-simple to copy over the old bayes database. Hardly any effort at all really. If it's from a current corpus it's going to help. Maybe not an earth shaking amount, but spamassassin is all about lots of little incremental scores. Might be the difference between a 4.95 score and a 5.0. Nothing I told him should take longer than maybe five minutes.
If you have any insights and tricks for getting a new build up and running efficiently as quickly as possible I'm sure we'd all love to hear them.
...Kevin
--
Kevin Miller
Network/email Administrator, CBJ MIS Dept.
155 South Seward Street
Juneau, Alaska 99801
Phone: (907) 586-0242, Fax: (907) 586-4500
Registered Linux User No: 307357
More information about the MailScanner
mailing list