More spam after spamassain upgrade

Sat Jul 26 02:15:03 IST 2003

On Friday, Jul 25, 2003, at 17:21 US/Pacific, Chris Trudeau wrote:

>
>> I wonder when this is going to be resolved.  They haven't released
>> 2.60
>> yet, but from things I've seen, it looks like 2.60 is already obsolete
>> (spammers have already trained themselves to it).  When will the base
>> rule system go back to being a strong set of its own that doesn't
>> depend upon you using bayes (since that's not really an option here)?
>
> Why is this not an option???....

What process would you suggest I use for getting message feedback from
20,000 users when we don't have individual config files/directories for
users on our mail servers (and even if we did, how would that interact
with messages to multiple users or mailing lists, neither of which are
expanded before they get to mailscanner), we don't have anyone on staff
who can review user submissions of false positives/false negatives (we
are NOT going to blindly accept it when a user says 'this should have
been spam', partially because users will not always agree upon the
issue), and the things that auto-learning handles aren't the things I'm
worried about (auto-learning basically strengthens the system's resolve
about high scoring spam ... what I'm concerned about is changing the
scores of low-scoring spam; when I call sa-learn on my home machine, I
never call it on high scoring spam messages, for example -- I call it
upon messages that were _lower_ than my threshold)?

I just don't see how bayes would fit into our situation.

Down the road, I'm looking in to how to apply something I use at home
(where I have a "learn" and an "unlearn" folder, and my home mail
server automatically runs through those folders every night at 5am,
learning about things it did wrong) to our mail servers ... but at home
I've got _2_ users and at work I've got 10,000 times that many users.
At home, spamassassin is called out of my .forward, so there's never
confusion about whose bayes database to use, and there aren't any
non-user recipients like mailing lists.

I'm not sure that the mechanism will translate well to my production
servers at work.  There's the issue of server load as it tries to
update 20,000 bayes databases (hopefully the low-usage window will be
long enough to let all of those updates happen before usage picks  back
up), there's adding a front end that expands all messages to 1 end user
recipient per message before it gets submitted to mailscanner (which
means more work for mailscanner, as mailscanner will now see 10
messages instead of 1, if the message has 10 recipients), and there's
the issue of where to put the user data files.  If it does, then using
bayes will make sense.  Otherwise, I just don't see how it will fit my
environment.