Mailscanner plus SpamAssassin and CRM114

Mon Mar 29 16:09:09 IST 2004

> -----Original Message-----
> From: MailScanner mailing list [mailto:MAILSCANNER at JISCMAIL.AC.UK]On
> Behalf Of Kai Schaetzl
> Sent: Monday, March 29, 2004 7:32 AM
> To: MAILSCANNER at JISCMAIL.AC.UK
> Subject: Re: Mailscanner plus SpamAssassin and CRM114
>
>
> Simon Brock wrote on         Mon, 29 Mar 2004 12:03:19 +0000:
>
> > I have been comparing the quality of the spam rejection from
> > Spamassassin (with Bayes, Razor et al) vs the spam filter in my copy
> > of Eudora.  Despite teaching, SpamAssassin does not seem to come up
> > to the level of Eudora's filter.
> >
> > I have recently discovered CRM114 (crm114.sourceforge.net) which
> > claims and does produce very accurate results.  Using the distributed
> > learnt files, it was almost 100% accurate.
> >
>
> If you do that with SA you'll get the same result. Use a recent version of
> SA, use extra rules and configure it correctly. A Bayes engine of whatever
> type is not the solution to your problem, it's a single point of failure.
> The SA mailing-list can help you with correct configuration.
>
>

It's not a bayes engine ... It's a program written in the crm language,
which was designed to do filtering/diff processing. It's like saying Perl is
a bayes engine since SA is a perl based program. They do much, much more
than bayes processing and, like SA much of it is built around regex (kind of
or close to) processing. It's the learning process that is so much different
and the fact that it has a kind of "regex close" processing that allows it
to catch mutexes of the same type with out changing or adding rules. rather
than =~ /big bodypart/i it would be match [sales pitch] where sales pitch
has a learned value and might pertain to nuts, bolts, bigger body parts. CRM
is the reverse of SA in that it's default is to deny and it learns from FPs.
In fact the author states feeding it large corpuses of spam/ham is actually
detrimental to it's accuracy and recommends you not even use his optional
spam.css/nonspam.css files and learn with a blank slate... Supposedly your
FPs will drop to near zero in 3 days or less on the average where as SA
requires a baseline of 200 (recommended) spams before it will even use the
bayes filtering.

If you study the concept it looks pretty good, but there doesn't really
appear to be a reasonable way to implement it site wide and it appears it
should be used per user... kind of breaks the possibility if integrating in
a virtual user environment. Imagine SA running without having to use
thousands of rules, huge bayes databases, all manner of constantly changing
local rules. Imaging if it read and understood the context of the email,
that is more of a description of the goal of mailfilter.crm